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Abstract 

We address combinatorial problems that can be formulated as minimization of a partially separable function of 
discrete variables (energy minimization in graphical models, weighted constraint satisfaction, pseudo-Boolean 
optimization, 0-1 polynomial programming). For polyhedral relaxations of such problems it is generally not 
true that variables integer in the relaxed solution will retain the same values in the optimal discrete solution. 
Those which do are called persistent. Such persistent variables define a part of a globally optimal solution. 
Once identified, they can be excluded from the problem, reducing its size. 

To any polyhedral relaxation we associate a sufficient condition proving persistency of a subset of variables. 
We set up a specially constructed linear program which determines the set of persistent variables maximal 
with respect to the relaxation. The condition improves as the relaxation is tightened and possesses all its 
invariances. The proposed framework explains a variety of existing methods originating from different areas of 
research and based on different principles. A theoretical comparison is established that relates these methods 
to the standard linear relaxation and proves that the proposed technique identifies same or larger set of 
persistent variables. 
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1. Introduction 

Optimization models in the general form of 
minimizing a partially separable function of dis¬ 
crete variables, known as energy minimization, 
weighted/valued constraint satisfaction or max-sum 
labeling, proved useful in many areas. The func¬ 
tion has the form Ef(x ) = 2 C e£ fc( x c)- In com¬ 
puter vision and machine learning such models are 
largely motivated by maximum a posteriori inference 
in graphical models [71] used to model a variety of 
structured statistical recognition problems. In case 
variables take only two values 0 or 1, the problem is 
known as pseudo-Boolean optimization or 0-1 poly¬ 
nomial programming. Problems where terms (sum¬ 
mands) involve at most two 0-1 variables at a time 
are called quadratic. We consider the general case, 
where terms may couple more that two variables at a 
time (higher order) and variables can take more then 
two values (multilabel). 

One major trend for performing inference in graph¬ 
ical models is represented by graph cut methods. 
The basic capability is essentially to solve a binary 
pairwise submodular problem, e.g., image segmen¬ 
tation [18], by reduction to a minimum cut / max¬ 
imum flow problem. For the latter, many efficient 
algorithms exist and their running time is experimen¬ 
tally near linear for typical vision problems [9]. This 
basic method was extended to submodular multilabel 
problems [21, 51], to general multilabel problems by 
solving for an optimized crossover between two can¬ 
didate solutions at a time [10], to higher-order 0-1 
models reducible to a graph cut [34, 16] and to com¬ 
binations of higher order and multilabel [41, 13]. 

Another technique that can be considered nowa¬ 
days as a basic graph cut method is the roof dual 
relaxation [6] known in computer vision as quadratic 
pseudo-Boolean optimization (QPBO) [32]. It allows 
to find a partial optimal solution to a non-submodular 


binary problem and reduces to finding a minimum cut 
in a specially constructed network [7]. It can be inter¬ 
preted [28] as solving a submodular relaxation of the 
initial problem. This basic method is again extended 
to multilabel problems by solving crossover prob¬ 
lems [42] and to general higher order 0-1 problems 
by reduction (quadratization) techniques expressing 
the function as a quadratic function with auxiliary 
variables [22, 15, 5]. 

Another direction of extending graph cuts to higher 
order models relies on minimization of more gen¬ 
eral submodular functions. Several efficient max-ffow 
based algorithms have been proposed [4, 29] for min¬ 
imization of a sum of submodular functions (SoS). 
A natural extension of QPBO is represented by sub¬ 
modular and bisubmodular relaxations [28, 24], 

Arguably, linear programming (LP) is a much more 
costly tool than computing a minimum cut. Yet, it 
provides theoretical insight to many methods [36, 37] 
and there has been solvers developed that can ad¬ 
dress (sometimes approximately) large scale prob¬ 
lems. Dual decomposition methods [53, 35] or dual 
block-descent methods, in particular TRW-S [26], are 
competitive with graph cut based methods in terms of 
speed and quality. There are extensions of these spe¬ 
cialized LP methods to higher order models [33, 31]. 
Smoothing [49] and proximal [47] methods are scal¬ 
able and offer a theoretically guaranteed convergence 
speed. Cutting plane approaches [60, 73] are used to 
tighten the relaxation adaptively to the problem. 

One drawback of relaxation based methods is that 
the final discrete solution is obtained by so-called 
rounding schemes and often appears inferior to so¬ 
lutions by graph cut methods as they stay feasible to 
the discrete space. Even in the case when many of 
the relaxed variables take integer values in the opti¬ 
mal relaxed solution, a fundamental problem remains 
that they may not take the same integer values in 
the optimal discrete solution. Therefore, unless the 
relaxation is tight, a local rounding technique can¬ 
not provide any guarantees for general models. The 
situation is dramatically different when we consider 
quadratic pseudo-Boolean functions. There, all vari¬ 
ables that are integer in the relaxation correspond to 
at least one globally optimal discrete solution [44, 20]. 
This property of the relaxation is called persistency. 
For general 0-1 polynomial problems persistency was 
studied by [43, 2], In their terminology persistency is 
associated with relaxations and is a property of the 
relaxed solution as a whole. In this work we call any 
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partial assignment of a subset of discrete variables 
persistent if it can be provably extended to a globally 
optimal solution based on the properties of the re¬ 
laxation or any other sufficient condition. Success of 
relaxation based exact methods such as [50] on com¬ 
puter vision and machine learning problems suggests 
that often a large part of the relaxed solution is in¬ 
tegral. In this case we are interested in determining 
the largest subset of such variables that is persistent. 

Related Work The present work was to a large 
extent inspired by simple local sufficient conditions 
proposed in bioinformatics under the name dead end 
elimination (DEE) [14, 17]. In computer vision, sev¬ 
eral methods were proposed that identify a persistent 
assignment directly in the multi-label setting. These 
are methods by Kovtun [39, 40] and Swoboda et al. 
[63, 64], Method [64] is applicable with a general 
polyhedral relaxation and it maximizes the subset of 
peristent variables for their sufficient condition (dis¬ 
cussed in §4.5). 

In the case of 0-1 variables there are several differ¬ 
ent techniques. Adams et al. [2] proposed a sufficient 
condition on dual multipliers to prove persistency of 
the integral part of the relaxation. Quadratization 
techniques by Ishikawa [22], Fix et al. [15], Boros and 
Gruber [5] introduce auxiliary variables in order to 
reduce the function to a quadratic form and infer per¬ 
sistency from the QPBO method. Lu and Williams 
[43] generalized the roof duality approach to higher 
order by using a higher order linear relaxation. Kol¬ 
mogorov [28] generalized both QPBO and the con¬ 
struction by Lu and Williams [43] by proposing dis¬ 
crete submodular and bisubmodular relaxations. He 
argues that the key property of QPBO that needs to 
be generalized is the existence of a totally half integral 
optimal solution to the relaxation, i.e., with values 
in {0, 1}. He characterized all totally half-integral 

relaxations as bisubmodular relaxations. Finding a 
good (bi)submodular relaxation appears to be a chal¬ 
lenging problem. To our knowledge it was only re¬ 
solved for the special case of mincut-reducible relax¬ 
ations [24, 62], and even in this case it requires solving 
a series of linear programs. Even though the relaxed 
problem itself can be efficiently optimized (in particu¬ 
lar when it is a sum of submodular functions), having 
a sound persistency result at a comparable computa¬ 
tion cost is an open problem. No theoretical com¬ 
parison seems to be possible between (bi) submodular 
relaxations and quadratization techniques [28]. 

Kohli et al. [25] reduce multilabel pairwise prob¬ 


lems to 0-1 quadratic and Windheuser et al. [75] re¬ 
duce multilabel higher order problems to submodular 
relaxations of [28]. 

Contribution In this work we settle the persis¬ 
tency capabilities achievable with a general polyhe¬ 
dral relaxation. The previously known results are in 
a certain sense unique, relying on a specific sufficient 
condition or on a specific type of the relaxation. We 
show that persistency guarantees are not that rare. 
To any polyhedral relaxation we associate clear suffi¬ 
cient conditions for persistency. We propose a poly¬ 
nomial time method to determine the largest strongly 
persistent subset of variables according to the suffi¬ 
cient condition. The method sets up a linear program 
connected to the given relaxation polytope and max¬ 
imizes the number of strongly persistent variables. In 
comparison to QPBO-based or submodularity-based 
techniques, we employ a more costly optimization 
tool, but gain the following advantages: 

• the new sufficient condition generalizes a wide 
variety of existing methods that span across dif¬ 
ferent fields of research and apply different tech¬ 
niques; 

• it is possible to pose formally and solve (under 
certain restrictions) the problem of determining 
the largest subset of persistent variables; 

• the maximum w.r.t. to the proposed general suf¬ 
ficient condition is guaranteed to be at least as 
good as any of the individual methods or their 
combinations; 

• the method is invariant to the permutation of 
labels and reparametrization of the problem as 
long as the relaxation is invariant; 

• persistent assignments form a hierarchy when 
tightening the relaxation. 

The author’s previous work [54, 55] considered only 
pairwise models and the standard LP relaxation. 
This paper generalizes to higher order and arbitrary 
polyhedral relaxations, gives more complete proofs 
of some properties and establishes comparisons with 
a novel multilabel method [64] and higher-order 0-1 
methods [28, 22, 15, 2], 

Outline In §2 we propose a general approach 
to persistency with a general polyhedral relaxation. 
This includes the proposed linear program formula¬ 
tion of maximum persistency and general properties 
of the problem. In §3 we consider standard LP relax¬ 
ations and specialize the construction for this case, 
many properties are simplified. In §4 we propose a 
theoretical comparison between the proposed frarne- 
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work and other approaches. In §5 we validate our 
theoretical findings experimentally and compare per¬ 
formance on small random problems. In §6 there is a 
conclusion and discussion and §A contains proofs. 

1.1. Notation 

When generalizing to higher order models many 
statements and proofs simplify if we use a properly 
defined notion of the empty sum, the empty product 
and the empty Cartesian product. They are respec¬ 
tively: J^ ie0 Xi = 0, Uie 0 x i = 1 and Uie 0 X i = i 0 }- 
The inclusion cr is non-strict and cr is strict. LHS 
refers to the left hand side of an equation. [•] denotes 
the Iverson bracket, argrnin is the set of minimizers. 
Sets M, M+ denote real and non-negative real numbers 
and B = {0,1} is the Boolean domain. A composi¬ 
tion of functions is denoted as (/ o g)(x) = f(g(x)). 
Finally, a polytope is assumed to be convex but may 
be unbounded and a polyhedron means the same as a 
polytope. 

1.2. Energy Minimization 

A hypergraph (V, £) is given by the set of nodes 
V and the set of hyperedges £ c 2 V . We assume 
that V is totally ordered and each hyperedge C e £ 
is identified with the tuple of elements of c ordered 
w.r.t. the total order of V. We will further assume 
that 0 e £ and (Vs e V) {s} e £ . Let X s be a finite 
set of labels associated to a node s e V. For a subset of 
nodes C c V the set X c denotes the Cartesian product 
risGc x s i n the order defined on V and X = Ay. The 
assignment of labels to all nodes x : V —> X is called 
a labeling. Let x c denote the restriction of x to C c V 
(thus x s is just a single coordinate) and x 0 = 0. Let 
us define the following functions (terms): 


fsti 0,1) 

M i) 

/-(o) 

Figure 1: Graphical notation by Shlezinger [59]. Variables 
x s ,xt,x t ' are depicted as boxes, their possible states as cir¬ 
cles and states of pairs of variables as lines. In the first order 
model (pairwise) the energy of a labeling x is the sum of the 
selected unary and pairwise costs. 

The general energy minimization problem is NP- 
hard to approximate 2 . On the other hand, there are 
tractable subclasses. Works by Thapper and Zivny 
[66, 67] and Kolmogorov [30] characterized all lan¬ 
guages of energy functions with terms from a fixed 
finite set and unrestricted structure. They showed 
that there are no tractable languages other than those 
that can be solved by the basic LP relaxation (defined 
in §3), which proves that the relaxation is a universal 
and powerful technique. 

1.3. General Polyhedral Relaxation 
In this section we embed the energy minimization 
problem into the Euclidean space. A labeling x is 
represented as a 0-1 vector in order to linearize the 
energy and write it as scalar product of this vec¬ 
tor with the cost vector / consisting of all compo¬ 
nents fc{x' c ) for c 6 £. x' c e X c . According to 
these components let us define the following set of 
indices T = {(c, x' c ) \ C 6 £, x' c e X c }. The embedding 
5: X —> M 2 -; x —* 6(x) is defined by its components 



(Vc £ £) / c : A c —> 

M. (general hyperedge term) 

(Vc e £, x' c e A c ) 5{x) c (x' c ) = [x c =x' c ]. 

(3) 

The special cases read 


The special cases read 


fsa : {0} —> M 

(constant term), 

5(x) 0 = 1, 

(4a) 

fs- 

(unary / 0 order term), 

S(x) s (x' s ) = [x s =x'J, 

(4b) 

f{s,t} : X {s,t } -» R 

(pairwise / 1st order term) 

6 ( x ){s,t}i X '{s,t}) = l x s= x 'sll x t= x 'tJ 

(4c) 


and so on. The constant term f 0 is nothing but a 
single number. The energy function Ef: X —> M is 
defined by 


and so on. Let (■, •) denote the scalar product in M 1 . 
We can write the energy using the embedding 5 as a 
linear function: 


E f ( x ) = 2 fd x c)- ( 2 ) 

ce£ 

It is a partially separable function of discrete variables 
x. In this paper we will use a graphical notation of 
the energy explained in Figure 1. 


E f( x )=Y Yj, fc( x 'c) S ( x )c(x' c ) = (f,6(x)). (5) 


2 e.g., inapproximability of the traveling salesman prob¬ 

lem [45]. 
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The embedding 6 is illustrated in Figure 2. The en¬ 
ergy minimization can be expressed as: 

min = min </,/x> = min</,//>, ( 6 ) 

xeX ye5(X) yeM 

where 5(X) is the image of the set of labelings, i.e., 
the set of corresponding points in R 2 and Ai = 
conv<5(A) is their convex hull, called marginal poly¬ 
tope [71]. The second equality follows from the fact 
that a convex combination of solutions is also a solu¬ 
tion. Polytope A4 has in general exponentially many 
facets. A relaxation of the problem is obtained by 
replacing A4 with an outer approximation A z> JA: 

min </,/*>• (7) 

fte A 

A vector /j e Ac R 2 will be called a relaxed label¬ 
ing. We will consider polyhedral relaxations of the 
following general form: 

A = {/x e R 2 | Ap ^ 0; fiz = 1; /j 5 = 0}, ( 8 ) 

where we assume that A e R mx l 2 l is such that A 
is bounded and JA c A. Since JA is non-empty, it 
follows that A is non-empty. By these assumptions, 
relaxation ( 8 ) is a feasible and bounded linear pro¬ 
gram. Note that general inhomogenous equality and 
inequality constraints can be represented in this form 
by utilizing the component p 0 . The dual problem 
to (7) and the conical hull of A are expressed conve¬ 
niently as follows. Recall that for a convex set A cz R 2 
its conical hull is the set: 

coni(A) = {ap \ p e A, a ^ 0}. (9) 

Lemma 1.1. The conical hull of a relaxation poly¬ 
tope A (in the form ( 8 ), non-empty and bounded) is 
obtained by dropping the constraint p 0 = 1 : 

coni(A) = {p e R 2 | Ap ^ 0; p ^ 0}. (10) 

Proof on p. 23. 

The linear program (7) and its dual are expressed 
as 

min</, p) = 

Ap 0 
P 0 = 1 

p 5 = 0 f — A J ip — e 0 ip ^ 0 

where vector e 0 e R 2 is the basis vector for the com¬ 
ponent 0 and the equality between the primal and 
the dual formulations holds because the primal prob¬ 
lem is feasible and bounded. Let us introduce the 
notation := f — A T (/x Later on, when we consider 



Figure 2: Mapping S embeds discrete labelings as points in the 
space R x . Left: 2 variables with 2 states, lines of different 
colors show possible assignments. Right: to each labeling x 
there correspond a point 5(x) e R x . Axis x,y,z in the figure 
correspond respectively to <5 S (1), and <5 s t(1,1). In this 

representation the energy function is a linear functional. The 
minimization domain can be extended equivalently from the 
set of points 5{X) to their convex hull, the marginal polytope 
M. 

equality constraints of the form Ap = 0, the vector 
will obtain the meaning of an equivalent problem 
and for now it is just an abbreviation. 

2. Maximum Persistency 

A partial assignment y_A £ T 4 , where A cz V, is 
called weakly persistent if there exists an optimal so¬ 
lution x such that = yj^. In other words, can 
be extended to a global solution. Partial assignment 
y_A is called strongly persistent if x_a = y_A holds for 
all optimal solutions x. 

It may seem that there are no practical reasons to 
distinguish strongly and weakly persistent partial as¬ 
signments as long as they allow to simplify the prob¬ 
lem. However, it will become clear later that they 
have different theoretical properties leading to poly- 
nomially solvable versus NP-hard maximum persis¬ 
tency problems. It turns out that strong persistency 
is more tractable, whereas proofs are generally easier 
to obtain in the weak form and most results in the 
literature deliver weak persistency. 

In the case of quadratic pseudo-Boolean functions 
the roof dual relaxation [6] is persistent: for any re¬ 
laxed solution its integral part defines a partial as¬ 
signment yjy which is optimal to the discrete problem. 
Moreover, for any labeling x, not necessarily optimal, 
replacing part of x on A with y_ 4 , the overwrite op¬ 
eration, denoted in [ 6 ] by x\_A*—y\, has the following 
autarky property: 

iyxeX) E f (x[A*-y])^E f (x), (11) 

illustrated in Figure 3. We will generalize this prop¬ 
erty to the multilabel setting. 


max if: 
^eR™ 
ip e R 


(LP) 
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Figure 3: Quadratic pseudo-Boolean case (roof dual). There 
exist half-integral optimal solution to the relaxation (indicated 
by numbers). Its integer part (the assignment indicated by l’s) 
is persistent, i.e., optimal to the original discrete optimization 
problem. The arrows show how an arbitrary labeling can be 
changed in order to improve the energy: switch to the optimal 
assignment for integer nodes and keep the assignment of the 
remaining nodes. 



Figure 4: Improving mapping is a generalization of autarky. To 
every variable s a there is an associated mapping p 3 : X 3 —> X 3 , 
shown by arrows. For any labeling x its image p(x) has the 
same or better energy: Ef(p(x)) ^ Ef(x). If label i is not in 
the range p 3 (X 3 ) then it can be eliminated as shown by crosses. 

2.1. Improving Mapping 

The overwrite operation discussed above can be 
represented by a discrete mapping p: X —> X : x >—> 
x[A<—y]. The following generalization of autarky to 
an arbitrary mapping is proposed. 

Definition 2.1. A mapping p: X — * X is called 
(weakly) improving for / if 

(VxeA) E f (p(x)) ^ E f (x), (12) 

and strictly improving if 

(p(x) ¥= x) => Ej(p(x)) < Ef(x). (13) 

The idea of the improving mapping is illustrated in 
Figure 4. It easily follows from the definition that if 
p is improving then there exists an optimal solution 
x 6 p(X) and if p is strictly improving then all opti¬ 
mal solutions are contained in p(X). In this way an 
improving mapping reduces the search space from X 
to p(X). 

We will consider node-wise mappings, of the form 
p(x) s = Ps(x s ), where (Vs e V) p s : X s -> X s . Fur¬ 
thermore, we restrict ourselves to idempotent map¬ 
pings, i.e., satisfying pop = p. This restriction is 
without loss of generality. Indeed, for an improv¬ 
ing node-wise mapping p its compositional power 



Figure 5: Embedding of a discrete mapping in R 1 (continues 
the example in Figure 2). Left: discrete node-wise mapping 
p: X —> X is shown by arrows, it sends the green labeling to 
red and the blue one to black. Right: there is a corresponding 
linear map P: R 1 — » R x with this action on labelings embedded 
as vertices. It is an oblique projection which maps polytope At 
onto the red facet P(At). 

p k will be idempotent for some k ( e.g ., for k = 
(max s | Vf s |)!, which turns all cycles in the map to 
identity) and provides equally good or better reduc¬ 
tion with p k (X) c p(X). Idempotent maps have two 
following properties. Let A be a set and p: X —* X 
idempotent. 

• If p(x) ¥= x then no y e X is mapped to x\ 

• For Y = p(X) the restriction of p to Y is the 
identity map x >-> x and there holds p(X) = 
{xeX\ p(x) = x}; 

It follows that knowing an improving mapping p, 
we can eliminate labels (s,i) for which p s (i) Y i and 
there will remain at least one global minimizer of Ef. 

Given a mapping p, the verification of the improv¬ 
ing property (12) is NP-hard since already in the 
quadratic pseudo-Boolean case the verification of au¬ 
tarky property (11) is NP-hard [ 8 ]. A tractable suffi¬ 
cient condition will be constructed by embedding the 
mapping into the space M " 1 and applying the relax¬ 
ation there. 

2.2. Relaxed Improving Mapping 

Definition 2.2. A linear extension of p: X —> X is 

a linear mapping P : that satisfies 

iyxeX) S(p(x)) = P6(x). (14) 

See Figure 5 for illustration. Avoiding the discus¬ 
sion of uniqueness 3 , we will only use the following 
linear extension for a node-wise mapping p\ X —> X, 
which will be denoted \p\. The linear extension 


3 When a linear extension exists, its restriction to the affine 
hull of 5{X) is unique. 
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P = [p] is defined by 

(Pp) c (x c ) =Yi P c,xc,x' c Vc{x'c) (15) 

Xq 

with coefficients 

P C.Xa,x' c : = n lPs( x s )= x J = [Po(a4)=*c]- (16) 

sec 

These coefficients should be understood as a “matrix” 
representation of P. To verify that (14) holds true we 
simply substitute an integer labeling S(x) and expand 
the components as 

(P5(x))c(x' c ) = Xi E[ ^ Ps ( x ") =x '* I H x ' c=x " I 

sec (17) 

= <y(p(*))c(*o)- 

Using the linear extension P of p we can write 

E f (p(x)) = </, 5(p(x))) = </, P5(x)). (18) 

This allows to express the condition of improving 
mapping (12) as 

(VxeT) </, PS(x)) < (f,8(x)) (19) 

or equivalently, fully in the embedding, as 

(V/i £<*(*)) (f,Pp)^(f,p). (20) 

Taking convex combinations in (20), we obtain an 
equivalent condition 

(VpeVW) (f,Pp)^(f,p). (21) 

Thus we have linearized the inequalities necessary for 
an improving mapping. However, the marginal poly¬ 
tope A4 is not tractable. We introduce a sufficient 
condition by requiring that the same inequality (21) 
is satisfied over a larger (tractable) polytope A 3 M.. 

Definition 2.3. A linear mapping P: —> M 2 * is 

(weak) A -improving for / if 

(V/r e A) </,Pp>^</,p>; (22) 

and is strict A-improving for / if 

(Vm e A, Pp * p) </, Pp) < </, p). (23) 

Statement 2.4. Let P: M 2 - — * M 2 - be a linear exten¬ 
sion of p: X —*• X and A a relaxation polytope. If P 
is A-improving for / then p is improving for /. 

Proof. A relaxed-improving mapping P satisfies in¬ 
equality (22) over a superset A of A4, therefore con¬ 
dition (21) is satisfied, which by the definition of ex¬ 
tension is equivalent to (12). □ 

The set of mappings for which (22) (resp. (23)) is 
satisfied will be denoted W/ (resp. S/). For conve¬ 


nience, we will use the term relaxed improving when 
the relaxation is clear from the context. 

Naturally, a strict relaxed improving map is relaxed 
improving, i.e., §/ cr W/. This is so because for all 
/i e A such that Pp. = p the inequality (22) is trivially 
satisfied. 

Next we show that the verification of P e Wj (resp. 
P e Sf) for a given P can be solved (decided) in 
polynomial time. The definition (22) of P £ Wj is 
equivalent to the expression 

min ((I ~ pJ )/, A*> ^ 0. (24) 

fie A 

The optimization problem in (24) will be therefore 
called the verification LP. As a linear program over 
a tractable polytope A, it can be solved in polyno¬ 
mial time and hence the decision problem P e Wf is 
solvable in polynomial time. 

In order to show that the verification of [p] £ S f 
can also be decided in polynomial time we introduce 
the following equivalent reformulation. 

Statement 2.5. Let O = argmin(/, (/ — P)fi)- 

fie A 

There holds P £ Sf iff 

O = P( A). (25) 

Proof on p. 23. 

The statement says that a strictly relaxed improv¬ 
ing mapping must not change the set of all optimal 
solutions to the verification LP. This can be further 
expressed in components of the mapping and of the 
support set O: 

Statement 2.6. Let O c = {x c £ X c \ (3p £ 
O) Pc(xc) > 0}. There holds [p] e Sf iff 

(Vc e £) Oc = p c (X c ). ( 26 ) 

Proof on p. 23. 

Now, in order to solve the verification of p £ S f 
in polynomial time we can solve the verification LP 
in (24), obtain C-support sets of its optimal solutions 
O c and check condition (26). 

2.3. Properties 

We next give necessary conditions for p in order 
that [p] £ W f or [p] e Sf. They help to narrow 
down the set of maps to be considered. A relaxed 
improving map must preserve optimality of solutions 
to the relaxation and consequently their support set 
(again in components). 
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Lemma 2.7 (Necessary conditions I). Letp: X — *• X 
be node-wise and P = [p]. Let O = argiriin /i6A (/, p) 
and O c = {x c e X c | (B/i e O) n c (x c ) > 0}. Then 

(i) For p 6 Wf there holds 

P(0) c O; (27a) 

(Vc 6 £) Pc (O c )c0 c ; (27b) 

(ii) For p e §f there holds 

P(0) = O- (28a) 

(Vc e £) p c {O c ) = O c - (28b) 

Proof on p. 23. 

Next, we reformulate problems \p\ £ Wf and [p] £ 
8 f dually, i.e., not with quantifier (Vx £ A) as in (22) 
but with existence quantifiers. This will become im¬ 
portant in the formulation of the maximum persis¬ 
tency problem where we optimize over p subject to 
the constraints \p\ £ W/ (resp. [p] £ §/). Recall 
that the set Wf is defined for the relaxation poly¬ 
tope A = {p £ M x | Ap ^ 0; p 0 = 1; p ^ 0}, where 
A £ M mx l x l. 

Theorem 2.8 (Dual representation of Wf). Set Wf 
can be expressed as 

{P: M x ^M x |(3p£M™) /-A T p-P T /^ 0}. (29) 

Proof. Denote g = (I — P J )f . Condition (24), equiv¬ 
alent to ( 22 ), can be stated yet equivalently for the 
conical hull of A: 

inf (30) 

/LtGcom(A) 

This is because for any p £ A and any a ^ 0 vector 
op will satisfy RHS of (22) as well. Using the 
expression for the conical hull of A in (10), we can 
write the minimization problem in (30) and its dual as 

inf(< 7 ,p) maxO. 

Ap^O p £ (31) 

p ^ 0 g — A J <p ^ 0 

Inequality (30) holds iff the primal problem is 
bounded, and it is bounded iff the dual is feasible, 
which is the case iff (3 <p £ M™) (/ — A T p) — P T f ^ 
0 . □ 

The set Sy is defined via a more complicated quan¬ 
tifier (Vp £ A, Pp A p). Fortunately, the following 
dual reformulation holds for node-wise maps: 

Theorem 2.9 (Dual representation of §/). Let 
p: X —> X be node-wise. Then: (i) there exists e > 0 


such that [p] £ §/ iff 

(3p£M™) f-A T <p-\p\ T f>eh, (32) 

where h is a function such that h ^ 0 and h c (x c ) = 0 
iff Pc{x c ) = x c ] and (ii) for rational inputs (includ¬ 
ing h ) the value of e in (i) is a rational number of 
polynomial bit length. Proof on p. 24. 

The constraint [p] £ S/ can thus be reduced to 
nearly the same representation as (29), with an addi¬ 
tion of an eh slack term. By construction, this term 
is zero iff [p]p = p. In practice, taking a larger value 
of e always results in a sufficient condition for S f and 
hence does not break correctness. In theory, we want 
a very small e but not so small that it would break 
polynomiality of the reformulation, which is ensured 
by part (ii). Note, while the set Sy in the space of all 
maps R x —> M x was convex but not closed (as seen 
from definition (23)), the theorem encloses the dis¬ 
crete maps of our interest, {[p] | p: X —> X node-wise} 
in a closed (convex) polytope. 

Finally we give a necessary condition for Wf. The 
theorem has a primal and a dual counterpart. The 
primal counterpart states that when solving the ver¬ 
ification LP, because its objective (I — P T )f is in the 
null space of P, the constrains of the problem can be 
projected onto the same subspace providing a sim¬ 
plification. The dual counterpart states that there 
always exist dual multipliers such that the improving 
property holds component-wise for reparametrized 
costs. This is useful in proofs, providing an alter¬ 
native reformulation of local inequalities (29). 

Theorem 2.10 (Necessary conditions II). Let 
P: M x —► M x be idempotent, P(A) cz A and P £ Wf. 
Then 

inf <(/-P) T /,/r> = 0; (33a) 

0 

(3 ip £ R™) {I - P T ) (/ - vfV) ^ 0. (33b) 

Proof on p. 24. 

These conditions become necessary and sufficient 
for standard relaxations as discussed in §3.1. The 
constraint A(I — P)p ^ 0 in (33a) replaces the con¬ 
straint Aji ^ 0 in (31) and simplifies the problem. 

2-4- Maximum Relaxed Improving Mapping 

We showed in §2.2 that weak/strict relaxed- 
improving property can be verified in polynomial 
time and have described sets Wf, 8 /. Any relaxed- 
improving map, with the exception of the identity, 
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order 

labels 

maps 

(max-si) 

(max-wi) 

pairwise 

2 

all 

P 

P 

higher-order 

2 

all 

P 

NP-hard 

any 

any 

V 1>v or V 2 ' v 

P 

P 

any 

any 

V 1 

P 

NP-hard 

pairwise 

^4 

V 2 

NP-hard 

NP-hard 


Table 1: Polynomiality of finding the maximum strict/weak 
relaxed-improving mapping for a general relaxation. 


eliminates some labels as non-optimal. Recall that 
the label (s, i ) is eliminated by node-wise mapping p 
if p s (i) / i■ We formulate the following maximum 
persistency problem: 

max^ lp s { i )=£ i |1 s.t. [p]eW/, (max-wi) 

^ s,i 

i.e. we directly maximize the number of eliminated 
labels. The strict variant, with constraint [p] e S/, 
will be denoted MAX-SI. 

The problem may look difficult to solve. Indeed, 
it optimizes over discrete maps and involves a gen¬ 
eral polyhedral relaxation in the specification of con¬ 
straints. Nevertheless, if we place some additional 
restrictions on the set of maps, it turns out to be solv¬ 
able in polynomial time in a number of cases summa¬ 
rized in Table 1. One of them is the pseudo-Boolean 
case, where there are only 3 possible idempotent maps 
for every node: (0,1) >-*• (1,1), (0,1) >—► (0,0) and 
(0,1) (0,1). Problem (MAX-Sl) turns out to be 

solvable in this case. For multilabel problems, node¬ 
wise mappings are more diverse. Motivated by the 
goal to include/generalize existing multilabel meth¬ 
ods, the following sets of maps are introduced: 

all-to-one maps. The set V 1)V of maps p of the 
form p: x >—> x[A<—y\ for all A c: V and fixed y e X. 
This class is a straightforward generalization of the 
overwrite operation in the autarky (11). A mapping 
p 6 V 1,y is illustrated in Figure 10(a). There are only 
two possible choices for every node s. The mapping p s 
either contracts X s to a single label {y s } or retains X s 
unchanged. This class allows to explain one-against- 
all method of Kovtun [39] and the central part of the 
method of Swoboda et al. [64] as discussed in §4.4, 
§4.5. 

all-to-one-unknown maps. Set V A 

mapping p e V 1 has the same form as above, p: x >—> 
x[A<—y], however the labeling y is not fixed now but a 
part of the specification of the mapping, see Figure 11. 
In every node there are \X S \ + 1 choices for p s : send 
all labels to a single one (which may be chosen) or 



Figure 6: Example of a map in the subset-to-one class V 2 ' y ■ 
Labeling y is fixed while a map p can select a subset of labels 
in every node s that are sent to y s . Nodes without an outgoing 
arrow are mapped to themselves. 


change nothing. It is easy to see that in the case 
of two labels, V 1 contains all idempotent node-wise 
maps. As will be shown later the (max-si) problem 
over this class decomposes into sufficient conditions 
to determine y from the integral part of the solution 
to the relaxation and the (max-si) problem over V 1,y . 

subset-to-one maps. The set of maps V 2,y is defined 
as follows. Let V = {(s,f) | s e V,i e X s } - the set 
of labels in all nodes. Let £ e {0,1} V . Mapping 
p^ e V 2,y in every node either preserves the label x s 
or overwrites it with y s : 


pd x )s 


Us if Cs,x s — 0) 

•Ks if C S,X s = 1- 


(34) 


Vector (£ Si j \i e X s ) serves as the indicator of the 
subset of labels in node s that stay immovable while 
all other labels are mapped to y s , see Figure6. In a 
node s there are 2^ Xa ^ 1 choices for p s . Clearly, this 
class generalizes V 1,y . 

The main result of this paper is that both 
(max-wi) and (max-si) problems are tractable for 
the class V 2 ' y . Other tractability results in Table 1 
are obtained as corollaries. Intractability results are 
shown to hold for the basic LP relaxation in §3.1. 


2.5. Formulation for Subset-to-one Maps 

In the following three subsections we gradually 
show that (max-wi) problem over V 2 ' y class can be 
written as a mixed integer linear program in which 
integrality constraints can be relaxed without loss of 
tightness and thus we obtain an equivalent LP for¬ 
mulation. 

Using the dual representation of the constraint 
[p] e W f (29) and the form of the mapping (34), 
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monomial 

new variable 

(SI) 

ELd*., d c c 

— >■ Cd £ Css = 1 


multilinear polynomial 

linearization 

(S2) 

d{ Z ) — XjDCC E[seD z s 

> G{ C) Sdczc ^dCd 

linearization properties 


(S3) 

gi{z) +g 2 {z) 

- Gr(C) + G 2 (C) 


ag(z) 

- <*G(0 


V 2 £{0,1} C g(z) = 0 

o G(C) = 0 



identity inequality 
for B c c 

new constraint (S4) 

n>® n (! - z s ) > o 

sgb sgc\b 

- 2 (-i) |d| Cdub ^ o 

dcc\b 

any other 

(S5) 

identity inequality 

identity inequality 

\/z £ {0,1} C g{z) > 0 

^ (VC G Z c ) G{ C) ^ 0 


Table 2: Summary of correspondences in the relaxation ap¬ 
proach [58] within a hyperedge C. 


the problem (max-wi) becomes 

min ^ Cm (35a) 

seV ieX s 

(Cm e {o, 1} I s £ V, !6 Af s ); (p e M+; 

/ - A J p - [p ( ] T f > 0. (35b) 

Constraints (35b) involve a complicating expres¬ 
sion [p^]. Let us express coefficients P c>Xo y (16) of 
the linear extension P = [p^]. Substituting mapping 
p^ (34) they are expressed as polynomials in (: 

P C,x c ,x'o = n i'Pd xl )s= X 4 (36) 

SEC 

= | | JCs,^ + Hi/s =a 'sJ(l Cs,x' a )^ • 

sec 

It appears that after expanding [p^] using (36) the 
constraint that we need to represent (35b) will in¬ 
volve products of binary variables Hsgd Cs.x' for all 
C£f,Dcc and Xp £ X D . To reach the ILP formula¬ 
tion we are going to replace each such product with 
a substitute variable Cd,^- This is achieved with the 
help of the relaxation of Sherali and Adams [58]. 

2.6. Relaxation of Sherali and Adams 

The relaxation of Sherali and Adams [58] is ap¬ 
plicable to polynomial programs with binary vari¬ 
ables z £ {0,1} V . The relaxation of order d per¬ 
forms a simultaneous lifting for all subsets of vari¬ 
ables C cr V with | c | = d. Let us focus on a single 
hyperedge C chosen for generality from the set of hy¬ 


peredges £ c 2 V . The construction and its proper¬ 
ties (within hyperedge c) are summarized in Table 2. 
For every product E[ S6I) z s , D cz c, a new variable 
C D is introduced (SI). A pseudo-Boolean function 
g : { 0 , 1 } C —> M is linearized by writing it as a multilin¬ 
ear polynomial and replacing each monomial E[<, eD z s 
with the new variable () D , (S2). From this definition 
we have linearity properties (S3), in particular: 

Lemma 2.11 (Identity Equality (S3)). Let G(() be 
the linearization of g(z). Then g(z) = 0 for all z £ 
{0,1} C iff G(() = 0 for all ( e M 2< . Proof on p. 25. 

Next, constraints on new variables are added which 
correspond to identity inequalities n*< n (! - 

SGB S 6 C\B 

z s ) ^ 0 for each B cr c. Clearly this inequality 
holds for all z £ {0,1} C . By expanding this expres¬ 
sion one obtains its equivalent multilinear polynomial 
g{z) = 2 ( _1 ) |D| risGDuB ^ ^ Constraints (S4) 

dcc\b 

ensure that the linearization of this expression is non¬ 
negative. The set of all such constraints defines the 
polytope 

z c = {C G m 2C | C 0 = 1, (Vb c C) 2 (~i) |d| Cd ^ 0}. 

dcc\b 

(37) 

In fact, polytope Z c is the convex hull of all binary 
vectors ( corresponding to configurations z: 

Lemma 2.12 (Convex hull). Polytope Z c equals the 
convex hull 

conv |c(z) £ M 2C | C(^)d = Vd c C,V^£ {0,1} C J. 

SGD 

(38) 

Proof on p. 26. 

From the convex hull representation there naturally 
follows an equivalence of identity inequalities before 
and after linearization: 

Lemma 2.13 (Identity inequality (S5)). Let G(() 
be the linearization of g(z). Then g(z) ^ 0 for all 
z £ {0,1} C iff G(C) > 0 for all ( £ Z c . Proof on p. 25. 

In particular, for C £ Z c there holds 0 ^ Cd ^ 1 
for D c C, a relation which is rather difficult to prove 
directly form (37). Finally, for our construction the 
next two results are necessary. 

Theorem 2.14 (Lemma 2 of [58]). If ( e Z c and 
unary components ( s are integer (i.e., equal to some 
z s £ {0,1}) for all s £ C, then there holds Co = ELed z s 
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for all Dec. 

Lemma 2.15 (Product). For (eZ c there holds ( 2 e 
Z c , where the product (? = CC is component-wise. 
Proof on p. 26. 

When applying the linearization to all hyperedges 
simultaneously, a variable Caob is introduced only 
once for (overlapping) hyperedges A, B £ £. All lo¬ 
cal properties described above continue to hold for 
each hyperedge C £ £ individually but of course they 
need not hold for the whole set V. 

2.7. Solution via Linear Program Formulation 

Let us return to the reformulation (35) of 
(max-wi). It is clear that by opening brackets in (36), 
the coefficients P c , Xc , x ' c can be expressed as 

Pc,x c ,x' c = c c ,D(*c,* D )n^ (39) 

Dec SGD 

where c CiD (x c ,x' d ) are appropriate constants not de¬ 
pending on C (detailed in §A.3). Because for x' s = y s 
there holds pg(y) s = Us irrespectively of ( S:Vs (label 
y s is always mapped to itself) we may assume that 
Cs,y a = 0 as well as all products involving it. 

The relaxation of Sherali and Adams is applied as 
follows. Let us denote X s = X s \{y s } and respectively 
= ELc X s . We substitute new variables C D ^ 6 
M in place of products EIsed Cs,x' i n (39). For zero 
products, i.e., for x' D ^ X D , we let C D ,xly = 0. From 
now on, let ( denote the vector of relaxed variables 

C = (C D ,i D eR|Vce£, Dcc,s D eT D ). (40) 

New variables ( must satisfy the following con¬ 
straints, defining a polytope Z: 

(ei = l, (41a) 

(VC £ £, Vx' c £ X C \X C , Vd <= C) Cd,*' = 0, (41b) 

(VC £ S, Vx' c £ X c , Vdcc) ^ (- 1 ) N Cd 1 *' > 0. 

dcc\b 

(41c) 

Polytope Z is the intersection of polytopes Z c (37) 
lifted to the space of all variables over c £ £ and 
x' c £ X c . Let P^ denote the extension-linearization of 
(34), according to (39) and (S2) defined by: 

i-^C^CjXcx'c ~ ^C) D (®c> ■^'d)Cd,xJ > ‘ (42) 

Dec 

For our purpose it is necessary that the linearized 
map P' preserves the relaxation polytope A: P^(A) c: 


A. This constraint expresses as (V/i £ A) 


( Pen)® - i; 

(43a) 

P ( p ^ 0; 

(43b) 

AP^p ^ 0 . 

(43c) 


We trivially have (P ^) 0 = (0 = 1. It is also easy 
to show that (P;‘)c x c x' ^ 0 for C e -2: before lin¬ 
earization, coefficients P c X( x ' i n the expression (36) 
are clearly non-negative and by property (S5) it is 
guaranteed that (P^) C Xc x ' ^ 9 holds on Z. Then for 
p ^ 0 there holds P^p ^ 0. Interestingly, the con¬ 
verse is also true (but this result is not necessary in 
the subsequent construction): 

Theorem 2.16. Inequalities (41c) in the definition 
of polytope Z can be equivalently replaced with P^ ^ 
0. Proof on p. 26. 

There remains constraint (43c). In the case of 
standard local relaxations (to be defined later) con¬ 
straint (43c) holds automatically and needs not be en¬ 
forced. To account for general relaxations, we include 
constraint (43c) explicitly by representing it similarly 
to Theorem 2.8 in the dual form as: 

(34>£M+ Xm ) AP ( -<5> t A^ 0. (44) 

We arrive at the following relaxation of (max-wi) as 
a linear program: 


s,i 

(LI) 

(I-PZ)f-A T cp> 0 ; 

(45a) 

AP^ - T t A 5 s 0 ; <h ^ 0 ; 

(45b) 

C£Z. 

(45c) 


Constraint (45a) ensures that mapping P^ is relaxed- 
improving, constraints (45b) that it preserves the 
polytope: P^( A) c A and constraint (45c) ensures 
that for each c £ £ relaxed variables (Cd,o;d | d cr c) 
stay in the local convex hull for c. 

We claim that this relaxation is tight. As shown 
below, rounding down all components of ( in a fea¬ 
sible solution maintains feasibility (with possibly dif¬ 
ferent values of <p, <h) and can only improve the objec¬ 
tive. The rounding is performed by constructing the 
composite mapping PqPq- If P( is relaxed-improving 
then so is {P() 2 provided that it satisfies all feasibil¬ 
ity constraints. The auxiliary lemma below establish 
this feasibility: it verifies that P 2 = P^ 2 . Starting 
from a non-integer £ and building a feasible sequence 
by taking ( ( 2 we get each next point closer and 

closer to the integer limit. 
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Lemma 2.17. For £ £ Z there holds P 2 = Pq 2 . 
Proof on p. 27. 

Theorem 2.18. In a solution (£, cp, <h) to (LI) vector 
£ is integer. 

Proof. Because £ is feasible to (LI), the mapping P, £ 
is A-improving for /. Note, at this point, unless 
£ is integer it is not guaranteed that P^(M) cr M. 
and we cannot draw any partial optimality from it, 
neither Pq is guaranteed to be idempotent. By con¬ 
straints (45b), (45c), there holds P^( A) cr A. There¬ 
fore 

(Vm e A) </, P c P c p) < </, P c p) < </, m). (46) 

It follows that Pq = P(P( is A-improving. Since 
P((A) cr A, it is also P?(A) cr P^( A) cr A. 

By Lemma 2.17, there holds F 2 = i £-2 and by 
Lemma 2.15 £ 2 e 2. By induction, there holds 
P 2 ’ 1 = P^ 2 «, P 2 ” (A) cr A and P 2 " is A-improving. 
Let 

£* Xo = n ]im£^ o = [£ Cia!o = l]. (47) 

Since P<^* is A-improving and £* £ 2, it is feasible 
to (LI). Assume for contradiction that there exist 
(s',i') such that 0 < £ s v < 1. From (47) we have 
£* ^ fsi for all s, i and £*-, < £ s /j/. It follows that £* 
achieves a strictly better objective value, which con¬ 
tradicts the optimality of £. If all unary components 
£ S) j are integer then by Theorem 2.14 £ is integer. □ 

Corollary 2.19. The optimal solution to (LI) is 
unique. 

Proof. Assume for contradiction that £i, £2 are two 
distinct integer solutions to (LI). Since (LI) is a lin¬ 
ear program, their combination £ = (£1 + £i )/2 is 
an optimal solution (values of <p, <h are omitted for 
clarity). But if £1 =£ £2 then £ is not integer, which 
contradicts Theorem 2.18. □ 

Clearly, for an integer vector £ e Z the lineariza¬ 
tion P^ coincides with the extension of the discrete 
mapping p^ (34). It follows that the unique optimal 
solution to (LI) is the solution to (max-wi). 

2.8. Perturbation for Strong Persistency 
Problem (max-si) over V 2,y can be reduced to 
(max-wi) with a perturbed cost vector / as follows. 
It is sufficient to show that dual representation (32) of 
constraint \p\ £ §/ can be reduced to that of \p\ £ . 

For p £ V 2 ’ y we can choose components of vector h 


in the dual representation (32) of §/ as 

h c (x c) = gc(Pc(x c )) ~ 9c(x c ), (48a) 

9c(x c ) = Yj (48b) 

SEC 

Clearly, p c (x c ) = x c iff h c (x c ) = 0 and for p c (x c ) ¥= 
x c there holds 1 ^ h c (x c ) < |c|. With such a vector 
h the dual representation of S/ can be written as 

(/ + £9) - [p\ T (f + £9) - > 0, (49) 

i.e., the same constraint as (29) must hold but for an 
e-perturbed cost vector 

f:=f + sg. (50) 

Since the solution £ to the perturbed problem is inte¬ 

ger and unique it is the optimal solution to (MAX-Sl). 

2.9. Two-Phase Method 

Let us consider the class V 1,y . Formulation (LI) 
can be adopted by incorporating additional con¬ 
straints on £ (making variables ( S ,x s equal for all x s ). 
The proof of Theorem 2.18 is based on the fact that 
for a feasible £ also £ 2 is feasible. Clearly, this prop¬ 
erty is not destroyed by any equality constraints be¬ 
tween components of £. Therefore Theorem 2.18 con¬ 
tinues to hold and thus both (max-si) and (max-wi) 
problems over V 1,y are tractable. 

For class V 1 the problem (max-si) can be solved 
as proposed by Algorithm 1. It first solves the LP- 
relaxation in order to determine the test labeling y 
and then solves the (MAX-Sl) problem for fixed y using 
perturbed (LI) for class V 1,y . 


Algorithm 1: Two Phase Method 

1 fi £ argrnin /teA </, p); /* solve (LP) */ 

2 For all s if there exists i £ X s such that p s (i) = 1 
then set y s = i, otherwise set y s arbitrarily; 

3 For strong persistency apply perturbation (50); 

4 Solve the problem (LI) for the class of maps V 2,y 
or V 2 ' y \ 


Theorem 2.20. Algorithm 1 solves (max-si) over 

V 1 . 

Proof. The necessary conditions of Lemma 2.7 for 
the optimal solution of LP-relaxation require that a 
strictly-improving mapping does not change optimal 
relaxed solutions. From the component-wise condi¬ 
tion (28b) follows that when p s if fractional for some 
s then p s (assuming p £ V 1 ) must be identity. When 
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y s is integer, the only possible value of y s qualifying 
necessary conditions must correspond to y s (y s ) = 1- 
Applying perturbation in step 3 and optimizing over 
V 1,y in step 4 we obtain the optimal solution. □ 

As a general heuristic, we can apply the same two- 
phase method, optimizing in step 4 over V 1,y or V 2 ' v 
with or without perturbation. The persistent assign¬ 
ment found by the heuristic is guaranteed to be at 
least as large as the solution of (max-si) over V 1 . 

3. Local LP Relaxations 

In this section we consider a special case of local (or 
standard) LP relaxations in energy minimization [59, 
58, 11, 38, 70], see also the survey by Werner [72], 
In our notation local relaxations are described by the 
polytope A of the form 

A = {/i e M 1 1 Ay = 0; y 0 = 1; y ^ 0}. (51) 

Recall that in the embedding 6, different compo¬ 
nents of a relaxed labeling y, e.g., y c and y D for D c c 
represent overlapping subsets of variables. In order 
that they represent all discrete labelings consistently 
they must satisfy marginalization constraints of the 
form 

(Vx D 6 A d ) ^ yc(x c ) = /i D (x D ). (52) 

*^c\d 

Werner [74] considers a family of LP relaxations 
generated by enforcing constraint (52) for some pairs 
of subsets C 6 £,d £ c. The set of such pairs is 
called the coupling structure [74], For C,D c V we 
define coupling relation Decof order d: let D £ c 
iff 

D £ c, c,De£, |d| d. (53) 

Subsequently, we will consider two possibilities: to in¬ 
clude only first order constraints or all of them. Zero 
order constraints (52) define just normalization: 

Yi^cixc) = P0• 

Xc 

Together with non-negativity they guarantee bound¬ 
edness (which was assumed in the general case §1.3). 
The first order constraints (52) add marginalization 
constraints of the form 

(Vs e C, Vx s e X s ) ^ yc(x c ) = y s (x s ). (54) 

x c\{s} 

And so on. By specifying larger d, we introduce more 
coupling between relaxed variables. 


Note that any relaxation in the form (51) is local, 
i.e., tied to the hypergraph. We cannot add more 
facets (inequalities) without increasing the number 
of variables and the variables are defined by the fixed 
embedding 5. Tightening the relaxation is thus only 
possible by enlarging the hypergraph (adding zero in¬ 
teractions in [72]), which results in an exponential 
increase in the number of relaxed variables. An ex¬ 
ample of a non-local relaxation is the cutting plane 
method [60], which progressively adds facet-defining 
inequalities coupling many variables at a time. While 
general results of §2 are applicable, the local repre¬ 
sentation would not be tractable. 

The primal and dual LP relaxation problems for 
coupling <= are expressed as follows: 

min(/, y) = max if 

(Vd £ c) 2 Vc(x c ) = y D (x D ), (p D ,c(x D ) e M, 

x c\d 

y 0 = 1, if £ R, 

(Vc e S, Vx c ) yc(x c ) > 0, f£{x c ) > 

Matrix A corresponds to primal equality constraints. 
Vector = f — A T ip is an equivalent transforma¬ 
tion [59] or reparametrization [70] of /. Its compo¬ 
nents are expressed as 

fc (x c ) = f G {x c ) -1>d,c(x d ) +^<£c,h(zc)- (55) 

D£C H^C 

In particular, components f 0 and ff(x s ) are ex¬ 
pressed as 

f 0 = fes+ Yi 7>0, c ; (56a) 

d0 

ff(x s ) = fs(x s ) - <p 0iS + <Ps, H (x s ). (56b) 

Hip} 

For any ip 6 M m there holds 

(Vy e A) </ v , y) = </, y) - (<p, Ay) = </, y). (57) 

Since AdMd ci(A’), it follows that Ef(x) = Ef V (x) 
for all x e X. Hence is indeed equivalent to / in 
defining the energy function. 

The dual problem can be equivalently written as 

max{/| | (Vc A 0) /(?(x c ) ^ 0, cp e M m }, (58) 

we therefore can speak of the dual solution as just ip. 

Complementary slackness Complementary 
slackness for (LP) reads that a feasible primal-dual 
pair (y, <p) is optimal iff (Vc e £\{0}, Va; c ) 

yc(x c ) > 0 => fc(x c ) = 0. (59) 


0, C A 0, 
if, C = 0. 
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Because a feasible dual solution satisfies ^ 0, the 
condition on the RHS of (59) implies that assignment 
x c is locally minimal for ffi : x c 6 argmin /<?(•). 

Strict Complementarity Let (p, <p) be a feasible 
primal-dual pair for (LP). This pair is called strictly 
complementary if 

Hc(x c ) > 0 ^ fc{x c ) = 0. (60a) 

Clearly, a strictly complementary pair is complemen¬ 
tary and thus it is optimal. Such a pair always exists 
and can be found by interior point algorithms (see 
e.g., [68]). It is known that g is a relative interior 
point of the primal optimal facet and ip is a relative 
interior point of the dual optimal facet. 

Arc Consistency The following conditions, 
known as arc consistency (AC, e.g., [74]), are sat¬ 
isfied for strictly complementary pairs: 

• If fc(x c ) = 0 then (Vd c c) fo(x D ) = 0. 

• If /d (®d) = 0 then (Vc 2 d) (3x' c e X c | x' D = x D ) 
/cVc)=0- 

These conditions say that the set of local minimiz- 
ers must be consistent over overlapping hyperedges. 
Arc consistency is a necessary but, in general, not 
sufficient condition for strict complementarity. 

BLP The relaxation with marginalization con¬ 
straints of order 1 is known as Basic LP relaxation 
(BLP) [74], Note, if we do not enforce marginaliza¬ 
tion constraints of at least order 1 there may occur 
integer feasible solutions to the relaxation which are 
not consistent, i.e., do not correspond to a global 
assignment. Out of all local relaxations BLP is the 
least constrained useful one. It is remarkable that it is 
tight for all tractable languages [66, 67, 30]. However, 
for certain purposes BLP is not sufficient, as can be 
illustrated with pseudo-Boolean functions. Suppose 
we would like to express a pseudo-Boolean function 
of 3 variables as a cubic polynomial. We know it can 
be expressed in this form, however, such a desired 
equivalent transformation of the problem appears to 
be not equivalent for BLP and hence not equivalent 
for the maximum persistency problem. Another dif¬ 
ficulty is that fixing a variable to its optimal value 
is not the same as eliminating this variable. Exam¬ 
ple in Figure 7 illustrates that eliminating a persis¬ 
tent variable tightens the relaxation. It follows that 
under BLP relaxation we won’t be able to compare 
theoretically neither to quadratization techniques (as 
they perform general equivalent transformations) nor 
to generalized roof duality [23], which incrementally 
eliminates persistent variables. 





Figure 7: An example when fixing a variable in BLP relaxation 
is not equivalent to eliminating it. (a) Energy in 3 variables (the 
leftmost variable has only one possible assignment). Costs 2 
and 3 are assigned to the pairwise term (solid lines) and 1 and 
4 to triplewise (faces), other costs are zero, (b) An optimal 
solution to BLP relaxation (of cost 0). BLP relaxation does 
not enforce marginalization between the triple and the pair, 
(c) The energy after elimination of the dummy variable. Now 
BLP relaxation is tight and can determine the optimal labeling 
of cost 1. 

FLP The relaxation with all marginalization con¬ 
straints present will be refereed to as Full local LP 
relaxation (FLP). For every hyperedge all its subsets 
are assumed to be contained in £ and all constraints 
of the form (52) with d equal to the order of the prob¬ 
lem are included. In case of pairwise model, individ¬ 
ual nodes are the only proper subsets of edges and 
hence BLP and FLP are the same. In the pseudo- 
Boolean case, FLP matches the relaxation of Sherali 
and Adams [58] as discussed in §A.9. 

3.1. Maximum Persistency with Local Relaxations 

In this section we summarize how the general con¬ 
struction and formulation of (LI) simplifies for local 
relaxations. First, the constraint P^(A) c A holds 
automatically and needs not be enforced. It is shown 
in two steps: first we consider the linear extension [ p ] 
of any node-wise mapping p and then the linearized 
mapping P, t, ( £ Z. 

Lemma 3.1. Node-wise mapping [p] preserves the 
local polytope A. Proof on p. 27. 

Lemma 3.2. Mapping Ft for £ e Z preserves the 
local polytope: P^( A) a A. Proof on p. 27. 

In short, Pq satisfies all the equality constraints sat¬ 
isfied by [p] and has all components non-negative for 
(e2. As a consequence of Lemma 3.2, the constraint 
of polytope preservation (45b) in the maximum per¬ 
sistency problem (LI) can be dropped. We can write 
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(max-wi) as 

minVCs,* (61) 

S % 

(I-pT)f-A T v> 0; ip e M m ; 

CeZ. 

Further properties of improving mappings for local 
relaxations are as follows. Conditions that are neces¬ 
sary for [p] e W f in the general case (Theorem 2.10) 
become necessary and sufficient for local relaxations 
and can be now summarized together with the dual 
representation Theorem 2.8: 

Theorem 3.3 (Characterizations). For a local relax¬ 
ation A all of the following are equivalent: 

(a) PeW/; 

(b) (3 (p e M m ) fv - P T f ^ 0; 

(c) (3<p e M m ) (I-P T )fv ^0; 

(d) inf {</ - P T f, p)\pe M^, A(I - P)p = 0} = 
0. 

Proof on p. 28. 

We have transitions from a global property (V/r e 
A) <(/ — P T )/, ft) ^ 0 (a) to component-wise lo¬ 
cal inequalities (b) and (c). Inequalities (c) offer an 
equivalent reparametrization f ,f in which mapping p 
improves every component independently: 

(Vc c £, Vx c e X c ) fg(p c (x c )) < fc(x c ). (62) 

This is a fairly simple condition similar in spirit to the 
idea of equivalent transformations by Shlezinger [59] 
(find an equivalent such that the global minimum 
may be recovered from independent component-wise 
minima). Condition (d) is a primal reformulation 
which has fewer equality constraints than the veri¬ 
fication LP and hence is simpler. 

Some properties expressed for all hyperedges C e £ 
can be simplified if we assume at least the BLP relax¬ 
ation. In Statement 2.6 it is sufficient that only unary 
components satisfy (Vs e V) O s = p s (X s ). For other 
components the constraint is implied by marginaliza¬ 
tion. For the same reason, in the perturbation (50) 
it is sufficient to have only unary components fs{ys ) 
increased by e for all s and leave higher-order terms 
intact. 

Lastly, there are following NP-hardness results 
with BLP relaxation: 

Theorem 3.4. Problem (max-wi) over the V 1 class 
of maps and the BLP relaxation is solvable in polyno¬ 
mial time for the quadratic pseudo-Boolean case and 


otherwise (when the problem is multilabel or higher 
order) it is NP-hard. Proof on p. 28. 

Theorem 3.5. Problem (max-si) with 4 or more la¬ 
bels over the class of maps V 2 = 1^*^ and BLP 
relaxation is NP-hard. Proof on p. 28. 

We see that the difference between weak and strong 
persistency leads to different complexity classes for 
the maximum persistency problem. The question of 
complexity of (max-si) with 3 labels is not resolved. 

4. Comparison Theorems 

This section is devoted to theoretical comparison 
between different persistency techniques. The firs re¬ 
sult is the following: 

Theorem 4.1. Let Ac A' and P be a A'-hnproving 
mapping for /. Then P is A-improving for /. 

Proof. The claim follows from Definition 2.2 and 
nesting of polytopes Ac A'. □ 

We therefore have a natural hierarchy: if we can 
identify some variables as persistent by the proposed 
sufficient condition with relaxation A' then for any 
tighter relaxation A c A' we are guaranteed to find 
at least the same persistent variables. Other nesting 
results under different reformulations of the problem 
are obtained in §4.6, §A.7. 

Table 3 gives an overview of the obtained compar¬ 
isons to other methods. The first comparison col¬ 
umn establishes that all listed methods correspond to 
a relaxed-improving mapping under standard relax¬ 
ations (recall that in the pairwise case FLP = BLP). 
For cases when Algorithm 1 is optimal, as indicated 
in Table 1, it is guaranteed to find the same or larger 
set of persistent labels than any other method. This 
fills the second comparison column in Table 3. In 
the remainder of this section we give a more detailed 
overview of different methods and comparison results. 

4.1. DEE 

We will consider Goldstein’s simple DEE [17] 
(which is stronger than original DEE by Desrnet et al. 
[14]) in the pairwise setting. For every node s this 
method considers its neighbors in the graph, A f(s), 
and for a pair of labels a, f3 e X s verifies the condi- 
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Dominated by Algorithm 1 


Corresponds to a BLP/FLP-improving mapping 



Simple DEE [17] 

v' 

- 


MQPBO [25] 


- 


Kovtun’s one-against-all [39] 

v' 

v' 

't 1 

Kovtun’s iterative [40] 

v' 

- 


Swoboda et al. [64]** 

v' 


G 

Roof dual / QPBO [44, 20, 6, 32] 

v' 

_* 

0 > o 

Reductions: HOCR [22], Fix et al. [15] 

FLP 

^* 

T5 O 

s ® 

Bisubmodular relaxations [27]*** 

BLP 

^* 

h o 

Generalized Roof Dualilty [23] 

FLP 

^* 

,_Lh S 
.SP £ 

A Oh 

Persistency by Adams et al. [2] 

FLP 

^* 


Table 3: Summary of theoretical comparisons. *Result holds 
for strong persistency variants and resp. strict version of Algo¬ 
rithm 1. **[64] is higher order but the comparison proof is for 
the pairwise case. ***Result holds for sum of bisubmodular 
functions over the same hypergraph as the BLP relaxation. 



t 


Figure 8: Improving mapping corresponding to an individual 
DEE condition. The full DEE method iterates over all nodes 
and labels and composes the found improving maps. 

tion 

(WxeX^) ( 63 ) 

fs{a) - fs(P) + [f{s,t}( a ’ x t) - f{ s ,t}(Pi x t )] ^ 0 , 

tejV(s) 

illustrated in Figure 8. If the condition is satisfied 
it means that a (weakly) improving switch from a 
to (3 exists for an arbitrary labeling x. In this case 
(s, a ) can be eliminated while preserving at least one 
optimal assignment. 

It is trivial to construct an improving mapping for 
this case. We let p s (a) = f3, p s (i ) = i for i =£ a\ and 
Pt(i) = i for all t ¥= s. The non-zero terms of the prob¬ 
lem g = (I — P T )f form a tree with root node s and 
other nodes t e A f(s) being leaves. It is known that 
in this case the FLP relaxation is tight and therefore 
p is FLP-improving. Similarly, the strict inequality 
in ( 63 ) implies \p\ e Sf. 

4.2. QPBO 

Let X s = B. The weak persistency theo¬ 
rem [44, 20] can be formulated as follows. Let p e 


argmin /ieA (/, p). Let O s = {i e B | p s (i) > 0}. Then 
(3x 6 argmin_E/(x)) (Vs 6 V) x s e O s . (64) 

X 

In the case |0 S | = 1 vector p s is necessarily integer 
and the theorem states that there is an optimal so¬ 
lution x to the discrete problem which is consistent 
with the integer part of the relaxed solution p. The 
largest weakly persistent assignment is obtained in 
the case p is the solution with the maximum number 
of integer components. 

The strong persistency theorem [44, 20] can be for¬ 
mulated as follows. 

Theorem 4.2 ([44, 20]). Let (p, cp) be a strictly com¬ 
plementary primal-dual pair. Let O s be defied as 
above. Then 

(Vx e argminFlj(x)) (Vs e V) x s e O s . (65) 

X 

The difference to (64) is in the quantifier V us. 3. 
Note, a strictly complementary solution has the min¬ 
imum number of integer components. 

Theorem 4.3 ([55]). Weak (resp. strong) persis¬ 
tency by QPBO corresponds to an FLP-improving 
(resp. strict FLP-impriving) mapping. 

The mapping is defined by p s (i) = 0 if O s = {0}, 
p s (i) = 1 if O s = {1} and p s {i) = i otherwise. The 
idea of the proof is to show that the dual optimal so¬ 
lution p provides the reparametrization in which the 
mapping improves every component independently, 
i.e., satisfies the inequalities of the characterization 
Theorem 3.3(c). 

It follows from the theorem that solution by Algo¬ 
rithm 1 with perturbation coincides with the strong 
QPBO persistency. 

4.3. MQPBO 

The MQPBO method [25] extends partial opti¬ 
mality properties of QPBO to multilabel problems 
via the reduction of the problem to 0-1 variables. 
The reduction, known as ”K to 2” transform [51] 
(K = T s |), depends on the linear ordering of la¬ 
bels in X s . The method outputs two labelings x mm 
and x max with the guarantee that there exists an 
optimal labeling x that satisfies x s e [x™ m , x™ ax ]. 
The corresponding improving mapping has the form 
p: x h-> (x v x mm ) a x max , where v and a are 
component-wise minimum and maximum, resp. in 
a given ordering of labels. The mapping is illustrated 
in Figure 9. Because the reduction is component-wise 
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Figure 9: Improving mapping found by the MQPBO method: 
all labels above * max are mapped to x max and all lables below 
* mln to * mln . Labelings x max and * mm are determined by the 
method from the underlying roof dual relaxation. 

and component-wise inequalities hold for QPBO it 
follows that the component-wise conditions of Theo¬ 
rem 3.3(c) hold for p (proof in [55]). For (x min , x max ) 
obtained from weak (resp. strong) persistency by 
QPBO there holds [p] e W/ (resp. \p\ e §/). Since 
the class of mappings of the form x >—► (x v x mm ) a 
x max is not among the cases for which Algorithm 1 is 
optimal, the question of tractability of (max-si) for 
this class remains open. 

4-4■ Auxiliary Submodular Problems by Kovtun 

There were several methods proposed [39, 40] 
which differ in detail. All methods construct an auxil¬ 
iary submodular energy E g . A minimizer y of E g has 
the property that E g (x v y) ^ E g (x), implied by sub¬ 
modularity. It follows that mapping p: x >—> x v y is 
improving for g. Figure 10 illustrates such mappings 
found by two of the methods in [40]. In case (a) the 
test labeling y must be the highest (the maximum) in 
the selected order of label. It follows that the map¬ 
ping p is essentially of the form x >—> x[A<— y], i.e., 
from the class V 1,y . The construction of the auxiliary 
function ensures that improvement in / is at least as 
big as improvement in g and so p is improving for /. 

Theorem 4.4 ([55]). Persistency by any method [39, 
40] corresponds to an FLP-improving mapping. 

Since in the case (a) the mapping is in the class 
V 1 , we know that the strict version of the method is 
dominated by Algorithm 1. In the case (b), the class 
of maps is a subset of maps considered in MQPBO 
and tractability of (max-si) problem is also open. 

Computationally, methods of Kovtun [40] have an 
advantage as they rely on the minimization of a pair¬ 
wise submodular function. In the case of the Potts 
model, the method [39] for all “flat” test labelings 
y s = a for a = 1...K, ( K = |A S |), can be effi¬ 
ciently performed using log(A') maximum flow com¬ 
putations [19]. It is very practical in some vision 



Figure 10: Improving mappings in Kovtun’s methods, (a) One- 
against-all method for fixed test labeling y. The method deter¬ 
mines a subset A of vertices for which the optimal labeling is 
y. (b) Iterative method [40] in which labeling y is found incre¬ 
mentally and with respect to a predefined ordering of labels. 



Figure 11: Improving mapping in the method of Swoboda et al. 
[64]. The method finds the labeling y and a subset A. Outside 
A the mapping is identity. 

problems ( e.g. results [39, 3, 19]), where unary costs 
are determining. At the same time experiments on 
difficult random problems in §5 reveal very poor per¬ 
formance of this method. 

4-5. Iterative Pruning by Swoboda et al. 

The iterative Pruning method was first pro¬ 
posed [63] for the Potts model and then extended to 
general pairwise and higher order energies [64]. The 
method can be interpreted as finding an improving 
mapping in the class V 1 (Figure 11). 

Theorem 4.5. Persistency by method [64] in the 
pairwise multilabel case corresponds to an FLP- 
improving mapping. Proof on p. 28. 

In fact the optimal value of y is determined in [64] 
by the initial relaxation, similarly to how it is deter¬ 
mined in Algorithm 1. Therefore, Algorithm 1 with¬ 
out perturbation identifies the same or better weak 
persistency. Algorithm 1 with perturbation identifies 
the same or larger set Mstrong as theoretically guar¬ 
anteed in [64]. 
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4-6. Quadratization Techniques 

We now turn to the higher order pseudo-Boolean 
case. There is a number of different reductions pro¬ 
posed [22, 15, 5] which represent the initial function 
of 0-1 variables as a minimum of a quadratic function 
over auxiliary 0-1 variables. Persistency is obtained 
then by applying the QPBO method to the reduced 
problem. Since QPBO solves the FLP relaxation, 
our goal is to compare local relaxations as well as 
relaxed-improving maps before and after the reduc¬ 
tion. Fortunately, full reductions [22, 15] are defined 
by chaining certain elementary reductions applied to 
separate cliques or groups of cliques. We define a 
sufficient set of atomic reductions with the following 
property: the maximum persistent subset by an FLP- 
improving mapping for the reduced problem is not 
larger than that one for the initial problem. Chain¬ 
ing these atomic reductions we obtain the following 
comparisons. 

Theorem 4.6. Persistency by Higher Order Clique 
Reduction (HOCR) of Ishikawa [22] corresponds to 
an FLP-improving mapping. Proof on p. 31. 

Theorem 4.7. Persistency by method of Fix et al. 
[15] corresponds to an FLP-improving mapping. 
Proof on p. 32. 

Ishikawa [22] proposed a family of elementary re¬ 
ductions (called 7-flipping) and posed the problem of 
what sequence of reductions gives in a certain sense 
the best overall reduction. This is a difficult combina¬ 
torial problem. While we do not address it directly, 
it follows that FLP maximum persistency by Algo¬ 
rithm 1 dominates persistencies that can be obtained 
by any reduction from the family and hence also the 
best one. 

4-7. Bisubmodular Relaxations 

Submodular/bisubmodular relaxations were intro¬ 
duced by Kolmogorov [27] as a natural generaliza¬ 
tion of the roof duality approach to higher order 
pseudo-Boolean functions. Kolmogorov showed that 
all totally half-integral relaxations are bisubmodular 
relaxations and vice versa. Similar to roof duality, 
(bi)submodular relaxations have a global persistency 
property. However, to a given function many differ¬ 
ent (bi)submodular relaxations can be constructed. 
There are two challenges in this approach. One is 
that the class of all (bi)submodular relaxations is very 
large and it is not tractable to parametrize it. The 


other challenge is to answer the question of which 
relaxation provides the largest persistent assignment. 

Kahl and Strandmark [23] build upon graph-cut re¬ 
ducible submodular relaxations. They propose that 
the relaxation which corresponds to the best lower 
bound on the energy is the optimal one. Their al¬ 
gorithm solves a series of linear programs to build 
the tightest graph-cut reducible submodular relax¬ 
ation. However, not all submodular relaxations are 
graph-cut reducible (it is a hard problem to determine 
which ones actually are [69] with the exception of cu¬ 
bic functions). Moreover, it is not clear whether the 
relaxation that gives the best lower bound is also the 
best one w.r.t. the size of the persistent assignment. 

We consider a more general case when the relax¬ 
ation is a sum of bisubmodular functions (SoB) over 
the same hypergraph as /. This class includes all 
graph-cut reducible submodular relaxations. Exploit¬ 
ing the property that SoB function can be minimized 
exactly by BLP relaxation [66] and properties [27], 
we obtain the following theorem. 

Theorem 4.8. Persistency by SoB relaxation [27] 
corresponds to a BLP-improving mapping. Proof on 
p. 34. 

The work of Lu and Williams [43] is a special case 
of SoB relaxation, it follows that their result corre¬ 
sponds to a BLP-improving mapping as well. 

4-8. Generalized Roof Duality 

As discussed above, the method of Kahl and 
Strandmark [23] finds persistencies by SoB relax¬ 
ation and all such relaxations are dominated by 
BLP-improving maps. There is however a catch. 
The method reduces the problem progressively by 
finding in each iteration a BLP-improving map. 
While Lemma 2.7 guarantees that fixing variables to 
their persistent values does not tighten the BLP- 
relaxation, eliminating persistent variables actually 
does (as explained in Figure 7). It follows that their 
method is not in general dominated by a single BLP- 
improving map. On the other hand, we can easily 
claim domination by a single FLP-improving map (as 
it is stronger than BLP and is not tightened by the 
elimination of persistent variables), which is also con¬ 
firmed experimentally. 

While avoiding the difficult question [27, 23] of how 
to find the best SoS or SoB relaxation, we give an 
answer to how to find same or larger strong persis¬ 
tent assignment. In the case of 3rd order energies 
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(quartic terms), Kolmogorov [27] gives an example 
where there is a tight bisubmodular relaxation but no 
tight submodular relaxation. It follows that in this 
case Algorithm 1 can determine strictly larger strong 
persistent assignment than [23]. We give experimen¬ 
tal confirmation of larger persistent set for both cubic 
and quartic problems. The proposed two-phase algo¬ 
rithm is seen more computationally attractive than 
the series of LPs of Kahl and Strandmark [23]. 

Windheuser et al. [75] extended generalized roof 
duality [27, 23] to multilabel case. For pairwise mod¬ 
els they showed equivalence with MQPBO. For higher 
order models the approach can be seen as a combi¬ 
nation of K to 2 transform [51] and application of 
submodular relaxation [27, 23]. As we have given 
comparison with (bi)submodular relaxations and K 
to 2 transform is component-wise, it should follow 
that the sufficient condition of [75] corresponds to an 
FLP-improving mapping. 

4-9. Persistency in 0-1 Polynomial Programming by 
Adams et al. 

Adams et al. [2] proved a persistency result for 
the 0-1 polynomial programming problem. The re¬ 
sult is based on the relaxation of Sherali and Adams 
[58], which can be identified with the FLP relaxation. 
They proposed a sufficient condition on the dual mul¬ 
tipliers in the relaxation which provides a persistency 
guarantee. The sufficient condition is a linear feasi¬ 
bility program that can be verified for a given partial 
assignment, similar in spirit to our verification LP. No 
method to find a persistent partial assignment except 
for the case when the integer part of the optimal re¬ 
laxed solution turns out to be persistent is proposed. 
We show the following. 

Theorem 4.9. Persistency by the sufficient condi¬ 
tion of Adams et al. [2, Lemma 3.2] corresponds to 
an FLP-improving mapping. Proof on p. 36. 

In fact their sufficient conditions splits the problem 
into two overlapping parts: one part, where an opti¬ 
mal assignment is unknown (call it the inner prob¬ 
lem) and the second part, containing all the assigned 
variables and the coupled unassigned variables (call 
it outer problem). It can be seen that the sufficient 
condition guarantees that any choice of unassigned 
variables together with the assigned ones delivers an 
optimal solution to the outer problem. Thus what 
happens in the inner problem can be efficiently ig¬ 
nored. Any feasible solution of the inner problem is 


optimal to the outer one. 

5. Experiments 

We propose two families of experiments: for pair¬ 
wise multilabel energies and higher-order pseudo- 
Boolean energies. We first discuss linear pro¬ 
gram (LI) for FLP relaxation in these two cases. 

5.1. Details of LI Program 
Explicit form of (LI) for the pairwise multilabel 
case is given in [55]. It is expressed with variables 
£ which are related to ( by £ S)i = 1 - CsM £st,ij = 

1 ~ Cm ~ Cm + Cmm so tlaat £st,ij is the linearization 
of f s ,i5,t,j = (1 — Cm)( i — Cm')- This parametrization 
is more convenient in the pairwise case. Its draw¬ 
back is a more complex representation of P? (which 
is however not needed except for the proof). 

Let us consider now the pseudo-Boolean case. 
Without loss of generality we assume that y = 0 
(otherwise variable values can be flipped). Let / 
be given in the form of a multilinear polynomial: 
fc(x c ) = r)c n.sec We let Cc denote Cc,i c - The 
expression P^f simplifies as 

{P(f)c(x'c) = 2 2 Cc ,d(®c,4)Cd,4/c(z c ) (66) 

x c eX c nee 

= X n(K = 1 l “ iJA = 1 ]) Et [^ = 1 ]Cd,^/c(1c) 

Dec SGD sgc\d 

= [M;=lc IlCchc- 

Components of (/ — Pj )/ simplify as 

fc{x' c ) - UV c = l c ]Ccr7c = [®c=lc](l - Cchc- 

The set X s = A s \{y s } equals simply {1} and thus 
polytope Z simplifies as 

C 0 = 1, (67) 

(Vc e £, Vdc c) 2 (- 1 ) |d| Cd ^ °- 

dcc\b 

Problem (LI) can be written as 

min V G (68) 

(Vc £ £) CcVc ~ ( j 4 T V , )c(lc) ^ 0; 

(VC £ T)(Vx' c A lc) -(AV) c (x' c ) ^ 0; 

— (AJ(p) 0 ^ 0 ; 

C£Z. 

Implementation in rnatlab is available at http: 
//www.icg.tugraz.at/Members/shekhovtsov/ 
persistency for research purposes. 
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5.2. Evaluation 

We evaluated all methods on small random prob¬ 
lems. The purpose of the experiments is to validate 
the theory and to verify whether the improvement 
obtained by the new method is not negligible. For all 
methods, including ours, for each instance we numer¬ 
ically verified that: 

• map p constructed by the method is relaxed im¬ 
proving w.r.t. FLP relaxation (by solving the 
verification LP (24)). 

• the persistency guarantee is correct (by solving 
exactly the initial and the reduced problems). 

We measure solution completeness as | V |^™ ^ 100%, 
where K is the number of labels in every node (| A s | = 
I \) and n e u m is the total number of labels (se V,ie 
A s ) eliminated by the method as non-optimal. 

Pairwise Multilabel Models We report results 
on random problems with Potts interactions and full 
interactions. Both types have unary weights f s {i) ~ 
t/[0, 100] (uniformly distributed). Full random en¬ 
ergies have pairwise terms f s t(i,j ) ~ 17[0, 100] and 
Potts energies have ) = ~7st(*)[*=j], where 

7 st(i) ~ [7[0, 50]. All costs are integer to allow for 
exact verification of correctness. Only instances with 
non-zero integrality gap w.r.t. FLP relaxation are 
considered (non-FLP-tight). The results are shown 
in Figure 12, while Table 4 gives details of the meth¬ 
ods. 

Higher Order Binary Models The proposed 
evaluation of higher order 0-1 models is based on the 
submodular library [61, 24], The library interfaces 
quadratization techniques HOCR [22] and Fix et 
al. [15] and implements three variants of generalized 
roof duality [24], GRD*. Figure 13 shows evaluation 
on random polynomials of degrees 3 and 4, sampled 
by the library. In the first series, we reproduce re¬ 
sults [24] with similar parameters but smaller prob¬ 
lems (e.g., n = 100 variables and T = 30 multilinear 
terms vs. n = 1000 and T = 300 in [24]). The re¬ 
sults for baseline methods are consistent with [24], 
It turned out however that most of the instances are 
FLP-tight. Our method, as well as [64], reduces in 
this case to solving the FLP relaxation and gives the 
trivial 100% persistency result. In the second series 
we increased the complexity by adding more terms as 
well as selecting only non-FLP-tight instances. The 
proposed approach determines a significantly larger 
persistent assignment. 

In Figure 14, we generated grid problems of degree 
3 with hyperedges {(*, j), (i + 1 ,j), ( i,j + 1)} and of de¬ 


gree 4 with hyperedges + 1 ,j),(i,j + 1),(* + 

1 ,j + 1)} at every grid location (i,j) e {1,... N — l} 2 . 
For each such hyperedge C we sampled the term f c as 
a random posiform (2l°l uniformly distributed num¬ 
bers, one per configuration, as opposed to sampling 
coefficients of multilinear polynomials in Figure 13). 
This results in somewhat more difficult problems to 
solve as there is no bias from an unsymmetrical treat¬ 
ment. We further selected only non-FLP-tight in¬ 
stances. It turns out that for the class of the prob¬ 
lems of degree 4 none of the baseline methods iden¬ 
tified more than 1 — 2% of the optimal solution, in 
contrast to the proposed method. 

Running Time Figure 14(b) gives a rough idea of 
running times when using CPLEX to solve linear pro¬ 
grams. The running time for Ll-FLP and Swoboda 
et al. includes only the time to solve linear programs 
and excludes all data preparation in matlab. The 
method of Swoboda et al. is the slowest one because 
it needs to solve several LP relaxations in the inner 
loop (but see [64, 65] for applicability with subopti- 
mal solvers and incremental computation). The pro¬ 
posed two-phase methods (Ll-FLP) solves two linear 
programs. Somewhat unexpectedly, the initialization 
phase (FLP) takes more than a half of the total time. 
The optimal version of GRD performs similarly to 
the proposed method but determines less variables. 
GRD-heur is much faster while the result is compa¬ 
rable to GRD. It can be concluded that for practical 
applicability of the proposed method a feasible but 
possibly only approximately maximal solution should 
be found. 

6. Conclusions 

Techniques for partial optimality avoid the NP- 
hardness of the energy minimization problem by ex¬ 
ploiting different sufficient conditions by which a part 
of optimal solution can be found. We proposed a new 
sufficient condition corresponding to a given poly¬ 
hedral relaxation and verifiable in polynomial time. 
The condition generalizes the mechanism of improv¬ 
ing mapping which is present in many works (al¬ 
though often in a hidden form) and allows to explain 
them from this perspective. We can explain variety 
of methods originating in different fields and relate 
these methods to linear relaxations. In particular, 
it follows that all covered methods cannot be used 
to tighten FLP relaxation. Applying them as a pre¬ 
processing in solving the FLP relaxation may only 
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Figure 12: Evaluation of pairwise multilabel methods. Problems size is 10x10, 4-connected. Bars of different shades indicate the 
portion of the sample under the given solution completeness value (statistics over 100 instances). 


DEE1 

Goldstein’s Simple DEE [17]: If f s {a) - f s (/3) +2 teA q s ) m in Xt [/ s t(a, x t ) - f st (f3, x t )] Ss 0 eliminate a. Iterate 
until no elimination possible. 

DEE2 

Similar to DEE1, but including also the pairwise condition: f s (a s ) — / s (/ 3 S ) + / t (a t ) — ft(Pt) + fst{ct s t ) — 
fst(Pst) + 2 min Xt ,[fst'{a s ,x t ') - fst'{l3 s ,Xt>)] + I] min Xt ,[f tt >(at,x t ') - f t t'(Pt,x t ')] > 0- 

t'eAf(s)\{i} t'eW(i)\{s} 

MQPBO(- 

The method of Kohli et al. [25]. The problem reduced to {0,1} variables is solved by QPBO(-P) [48], where 

P) 

“-P” is the variant with probing [8]. In the options for probing we chose: “use weak persistencies”, “allow 
all possible directed constraints” and “dilation=l”. 

Kovtun 

One-against-all Kovtun’s method [40]. We run a single pass over a = 1 ,...K (test labelings are ( y s = 
a |s e V)). Labels eliminated in earlier steps are taken correctly into account in the subsequent steps. 
Reimplementation. 

Swoboda 

Iterative Pruning method of Swoboda et al. [64] using CPLEX [1] for each iteration. Reimplementation. 

Ll 

The proposed method in Algorithm 1 for class P 2 without perturbation, both phases solved with CPLEX [1]. 

DEE2+L1 

Sequential application of DEE2 and Ll. Note, DEE2 uses a condition on pairs which is not covered by the 
proposed sufficient condition under pairwise BLP relaxation. 


Table 4: List of tested methods for pairwise multilabel evaluation. 


speed it up but cannot change the set of optimal re¬ 
laxed solutions. We formally posed and studied the 
problem of determining the largest set of persistent 
variables subject to the general sufficient condition. 
It appeared that there are reasonably large classes 
of this problem (restricted by the set of allowed map¬ 
pings) which can be solved in polynomial time. While 
the proposed solution might not be the most efficient, 
its generality allows to subsume multiple problem re¬ 
formulations, reductions, equivalent transformations 
and choices that other existing techniques depend on. 
In bisubmodular relaxations, this is the choice of a 
bisubmodular lower bound function, in method [40] 
the choice of the order of labels and the test labeling, 
in methods [22, 15] choice of the sequence of the re¬ 


ductions and flips. While optimizing these methods 
w.r.t. to all such choices does not seem tractable, it is 
tractable to find a persistent assignment (by the pro¬ 
posed method) which is at least as good as if these 
choices were optimized over. 

In the experimental evaluation we verified that our 
theoretical comparisons hold true, i.e. that all eval¬ 
uated methods (except DEE2 for which we do not 
claim anything) have output FLP-improving maps in 
all test cases. Our linear program (LI) had always in¬ 
teger optimal solution 4 £. The persistent assignment 
found by our method with FLP-relaxation was larger 
per instance and significantly larger on average. 


4 With exception of few cases when CPLEX experienced a 
numerical error. 
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Figure 13: Evaluation of higher-order binary methods on random polynomials generated as [24] for n = 100 variables. Plots with 
(d = 3, T = 100) and (d = 4, T = 30) reproduce results reported in [24]. In is seen that most of the instances are FLP-tight and 
thus solved exactly by LI. We increase complexity by evaluating (d = 3,T = 200) and (d = 4,T = 50) and selecting only non 
FLP-tight instances. 




Figure 14: Evaluation of higher-order methods on non-FLP-tight problems on grid. Left: Problems of degree 3 and 4 on grid 
size 10x10. Note, degree 4 appears difficult for existing methods. Right: Running time for problems of degree 3 and varied size 
up to 60x60. Percentage indicates solution completeness for selected points. 


6.1. Discussion 

Iterative Application Do we get more persis¬ 
tencies if the algorithm is run iteratively? 

If we consider FLP relaxation in the cases when 
maximality is guaranteed, a subsequent application 
of the method cannot give an improvement (it would 
contradict maximality). Maximality is achieved in 
pseudo-Boolean or multi-label class V 1 under strong 
persistency. It is also achieved if we keep the test 
labeling y fixed and consider the class V 2 ' y (for both 
weak and strong persistency). In the other cases it 
would be possible to improve by iterating the method. 
Because for BLP relaxation excluding persistent vari¬ 


ables may lead to a tighter relaxation (see Figure 7), 
the method can be iterated similarly to generalized 
roof duality, but the result is still dominated by Ll- 
FLP. In the multi-label case we can iterate while vary¬ 
ing the test labeling y, however this is computation¬ 
ally expensive and does not seem practical. 

Efficiency The present work focused on theoret¬ 
ical aspects. Practical applicability of the method 
requires some further research and development of ef¬ 
ficient specialized methods that use approximate so¬ 
lutions of the relaxation as [64] or a windowing tech¬ 
nique [56] or alike. Method of Swoboda et al. [64] 
performs not the best in Figure 14 and is also the 
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slowest when implemented with CPLEX. However, it 
can be made optimal for class V 1 as proposed in [65] 
and fast in practice using scalable dual solvers. Our 
most recent work in this direction [57] proposes an al¬ 
gorithm of this type for the pairwise multilabel case 
and V 2,y class of maps. It can be viewed as an alter¬ 
native (combinatorial w.r.t. the mapping) algorithm 
for the problem (LI) and achieves the necessary effi¬ 
ciency. 

Open Questions It was shown that strict persis¬ 
tency leads to a tractable problem for a larger set of 
maps. It guarantees not to remove ambiguous solu¬ 
tions. By increasing e in the perturbation method one 
gets a potentially stronger guarantee w.r.t. the uncer¬ 
tainty of the data, which may be explored. The gen¬ 
eral approach holds for an arbitrary bounded poly¬ 
tope, allowing one to incorporate also global linear 
constraints. This suggests a generalization to lin¬ 
early constrained discrete optimization problems or 
mixed integer linear programs. Another interest¬ 
ing direction is how to combine the proposed persis¬ 
tency method with cutting plane techniques. Finally, 
among the questions that remained open is polyno¬ 
miality of (max-si) problem for the following cases: 

(i) For all node-wise maps in a problem with 3 labels. 

(ii) For maps of the form x •—> (x v x min ) a x max with 
x mm and x max being free variables, which is relevant 
when labels have a natural linear ordering. The op¬ 
timal solution to this case would improve over the 
iterative method of Kovtun [40], MQPBO and the 
method [75]. 

A. Proofs 

Lemma 1.1. The conical hull of a relaxation poly¬ 
tope A (in the form (8), non-empty and bounded) is 
obtained by dropping the constraint fi 0 = 1: 

coni(A) = [gel 1 ] Ap ^ 0; p ^ 0}. (10) 

Proof. We will show that the defining set (9) is con¬ 
tained in (10) and vice versa. Let p £ A, let a ^ 0. 
Then Aag = aA/i ^ 0 and ap ^ 0. Therefore ap 
is contained in the set (10). Now let /i belong to the 
set (10). If /j .0 > 0, we can select a = p 0 and vector 
p' = p/p 0 £ A and conclude that p = ap' belongs 
to (9). Let p 0 = 0. Assume for contradiction that 
p A 0. Set A is non-empty by assumption, let p! e A. 
Then for any a ^ 0 there holds p' + ap £ A. But 
\\p' + ap || ^ a||p||, which is unbounded and con¬ 
tradicts boundedness of A. Therefore p = 0, which 


belongs to the set (9). □ 

Statement 2.5. Let O = argmin(/, (I — P)/ f). 

[ieA 

There holds P £ iff 

O = P( A). (25) 

Proof. By idempotency we have that P(A) = {/r e 


A, Pp = p). Let p' e P(A). Since ((/ — 
P T )/,/T) = 0 it is clear that the value of verifica¬ 
tion LP, min /i6 A<((/ — P T )f , p) is not positive. 

Direction Let (25) hold. For p e P(A) = O 

condition (23) is trivially satisfied and we find that 
the value of verification LP is zero. For p e A\P(A) 
from (25) follows that p is not a minimizer, therefore 
the objective must be strictly larger than zero and 
the strict inequality in (23) is satisfied. 

Direction “=>”. Let P satisfy (23). Then the value 
of verification LP is zero. Since any p £ P( A) satisfies 
Pp = p and achieves zero objective we have P( A) cr 
O. Now let n f P( A). In this case P/j. A /J and 
inequality (23) is strict. Therefore // f O. □ 

The condition (25) can be further expressed in the 
components of mapping p as follows. 

Statement 2.6. Let O c = {x c £ X c \ (3/x £ 
O) Pc{xc) > 0}. There holds [p] e iff 

(Vce£) O c = Pc {X c ). (26) 

Proof. Direction “=>”: We prove the negation of the 
implication. Assume (3c, 3a; c £ O c ) Pc(%c) A x c . 
Then 3 p £ O such that pc(xc) > 0. By idempo¬ 
tency, there is no x' c such that p c {x' c ) = x c . By (15) 
([p]p)c(x c ) = 0 A Pcfx c ) and therefore [p\p A p. 
From (25) follows that [p] f S/. 

Direction Let (26) hold. Let p £ O. Then for 
all C, x c such that p c (x c ) > 0 there holds p c (x c ) = 
x c . It follows from (15) that ([p]p) c = p c and thus 
[p]p = p. Therefore \p\(0) = O and thus O c P(A) 
and the value of the verification LP is zero. Because 
P(A) = {p £ A | P/i = p } the value of the objective on 
P(A) is zero and thus P(A) cr O. From (25) follows 
[p] £ S f . □ 

A.l. Properties 

Lemma 2.7 (Necessary conditions I). Letp: X —* X 
be node-wise and P = [p]. Let O = argmin /jeA (/, p) 
and O c = {x c £ X c | (Bp £ O) p c (x c ) > 0}. Then 
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(i) For p e W/ there holds 

P(0) c O- (27a) 

(Vc e £) pc(Oc) c ®a\ (27b) 

(ii) For p e Sf there holds 

P(O) = O; (28a) 

(Vce£) p c {O c ) = O c ; (28b) 


Proof, (i) Assume (3/j, e (9) Pp E A \0. Then 
</, Pp) > (/, ;u), therefore p f W/. This proves equa¬ 
tion (27a). 

Assume for contradiction that (27b) does not hold, 
i.e. (3c e £, 3x c e (9 C ) x' c := p c (x c ) A x c . Because 
x c £ (9 C there exists p E O such that p c {x c ) > 0. 
From expression (16) it follows that ([p]p) c (x' c ) > 0 
but x' c f O c and therefore [p]p f O , which contra¬ 
dicts to (27a). 

(ii) Assume (3 /j e O) Pp A p■ Then ( f. Pp) ^ 
</, p ) and therefore p f §/. This proves equa¬ 
tion (28a). 

Assume for contradiction that (28b) does not hold, 
i.e. (3c e V, 3x c e O c ) Pc(x c ) A x c . Because x c £ 
O c there exists p E O such that pc.(x c ) > 0. Since 
Pc(x c) A x c , from idempotency and (15) follows that 
(Mm)c(zc) = 0 A p c (x c ) and thus [p]/x A p, which 
contradicts (28a). □ 

Theorem 2.9 (Dual representation of §y). Let 

p: X —> A be node-wise. Then: (i) there exists e > 0 
such that \p\ E Sf iff 

(3y> eRT) f-A T <p-\p] T f>eh, (32) 

where h is a function such that h 5= 0 and h c (x c ) = 0 
iff p c (x c ) = x c ; and (ii) for rational inputs (includ¬ 
ing h) the value of e in (i) is a rational number of 
polynomial bit length. 

Proof, (i) The ”if” part. Condition (32) implies a 
weaker condition 

r - [p] J f > 0, (69) 

i.e. it satisfies dual representation of W/ (29) and 
therefore p is relaxed-improving. It remains to prove 
strictness. The value of the verification LP in (24) is 
zero. The value of its dual problem 

max{V> £ M | - [p] T f - e 0 ^\ <p e M™} (70) 

is thus also zero. It follows that <p is optimal to (70). 
We need to show that for p e O there holds [p]p = p. 
By multiplying (32) with Pc(x c ) and summing over 


C and x c we obtain 

<( J - [p] T )f, P) ~ <73 Ap) e(h, p). (71) 

Because p. are optimal primal and dual solutions, 
by complementary slackness <<p, Ap) = 0. Assume for 
contradiction that [p]p A p. Then (3c e £, 3x c £ T c ) 
{[p]p)c(x c ) A p c (x c ). We consider now two cases 
Case 1: if p c (x c ) A x c , then by idempotency for 
all x'c holds Pc.(x' c ) A x c and therefore from (15) we 
calculate that ([p\p) c (x c ) = 0. In this case from the 
assumption it must be p c (x c ) > 0 and 

(p, h) > pc(x c )h c (x c ) > 0. (72) 

Case 2: if p c (x c ) = x c then from (15) and the as¬ 
sumption follows (3x'< A x c ) such that Pc(x' c ) = x c 
and Pc(x' c ) > 0. In this case 

(p, h) ^ pc(x'c)hc{x' c ) > 0. (73) 

In both cases 1 and 2 we have <(/ — [p] T )/, p} > 0, 
which contradicts optimality of p. 

We now prove the “only if” part of (i). Let g = 
(I — P T )f and let (p, (<p, if)) be a primal-dual strictly 
complementary pair of solutions to 


min (g, p). (74) 

Ii gA 


Let Oc be the C-support set of primal solutions: 
O c = {x c e X c | Pc(x c ) > 0}. By Statement 2.6 and 
idempotency, there holds p c (x c ) = x c for x c £ O c . 
By strict complementarity, for x c £ O c there holds 
Sc(x c ) = 0 and for x c $ O c there holds gc(x c ) > 0. 
We let 


£ = 


min 

ce£, 

x c eX c \O c 


9c\x c ) 

hc(xc) 


> 0 . 


(75) 


Since for x c E O c h G (x c ) = 0, we can bound now 
components of g v as follows 


(Vc e £, Vx c ) gc(x c ) > eh c (x c ). (76) 

Expanding components of g^ as g G {x c ) = fc(x c ) — 
fc(Pc{x c)) — (^4 T V 9 )c(a ; c)) w e obtain relations (32). 

The statement of part (ii) of the theorem is proved 
as follows. The bit length of the rational dual solution 
tp is polynomially bounded as well as the bit length 
of rational numbers h. It follows that e calculated 
by (75) is a rational number of polynomially bounded 
bit length. □ 


Theorem 2.10 (Necessary conditions II). Let 
P: be idempotent, P( A) c A and P E W f. 
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Then 


P ^ 0, we have 


inf <(/ — P) T /, h) = 0; (33a) 

A(I-P)n^O 

fi^O 

(Sip e R+) (I ~ P T )(f - A J ip) ^ 0. (33b) 


Proof. Let g = (I — P T )f. The steps of the proof are 
given by the following chain: 


(b) 

, in | n </ - P T /, h> < inf </ - P T f , h> 

APfi^s 0 

A(/-P)/i>0 

0 


=' inf </ — P T /, h) =' sup 0. 

A(I-P)n> 0 ^eK+ 

(/-P T )(/-A 1 V)>0 

(77) 

On the LHS we have problem (30). If P e W/, this 
problem is bounded and the value of the problem is 
zero. Equalities (b), (c) essentially claims bounded¬ 
ness of the other two minimization problems in the 
chain. 

Inequality (b) is verified as follows. Inequality 
holds because by summing two inequalities 


hi = ^hi 5= Phi = hi • ( 83 ) 

Because hi e coni(A), there holds 

APfi'[ = APPp i = AP/ii > 0. 

By idempotency, (I — P)p'[ = (7 — P)P/ii = 0. 

Let h* = hi + h2- H preserves the objective, 

</ - P T f, h*> = </, (/ - P) (hi + h2)> (84) 

= </,(/- P)h2> = </,(/- P)h>- 
We also have that 

h* = hi + h2 2* hi + h2 = h 5 s 0, 

A(7 - P)/i* = A(7 - P)h2 = A(7 - P)h ^ 0, (85) 

AP/i* = APh? > 0 . 

Therefore, p* satisfies all constraints of the LHS of 
equality (c). Equality (d) is the duality relation that 
asserts that the maximization problem on the RHS is 
feasible. 

□ 


A.2. Relaxation of Sherali and Adams 


APp ^ 0, (78a) 

A(I-P)p> 0 (78b) 

we get Ap ^ 0. 

Equality (c) is the key step. We removed one con¬ 
straint, therefore ^ trivially holds. Let us prove 
Let h b e feasible to RHS of equality (c). Let 
h = hi + h2, where 

hi = Phi h2 = (/ —P)h- (79) 


Lemma 2.11 (Identity Equality (S3)). Let G(() be 
the linearization of g(z). Then g(z) = 0 for all z £ 
{0,1} C iff G(C) = 0 for all C e M 2 °. 

Proof. The correspondence between g and G is 
through coefficients a: 


g(z) = X 

Dec SGD 

(86a) 

g(c) = X q dCd- 

(86b) 


There holds 

(I - P)hi = (I ~ P)Ph = (P - P 2 )h = 0 , 
Ph2 = P(I - P)h = 0, 


(80) 


Dec 

We have g = 0 iff all coefficients of the (unique) mul¬ 
tilinear polynomial representation a are zero and it 
is the case iff G = 0. □ 


i.e., hi £ null(7 — P) and h2 £ null(P). Let us chose 
hi such that 

hi 5= hi an< I Ahi ih 0. (81) 

For example, the relaxed labeling 

hi = Ttttt X! ^( x ) ( 82 ) 

' I xeX 

will satisfy these constraints for sufficiently large 7 > 
0. Indeed, all components of hi are strictly positive, 
it belongs to M as a convex combination of integer 
labelings and therefore satisfies constraints of the re¬ 
laxation Hhi ^ 0 for any 7 > 0. It remains to chose 
7 large enough so as to have (hi) ^ hi satisfied. 

Notice, that (hi)0 = 7- Let hi = Phi- Because 


For subsequent proofs let us introduce the corre¬ 
spondence between binary variables 0 and their lifted 
representation ( as the mapping £(2) from {0,1} C to 
M 2 with components 

c« D =rh- ( 87 ) 

SGD 

Lemma 2.13 (Identity inequality (S5)). Let G(£) 
be the linearization of g(z). Then g(z) ^ 0 for all 
z e {0,1} C iff G(C) > 0 for all C e Z c . 

Proof. (<=) This part follows from the fact that 
£(z) satisfies all constraints of Z c and thus g(z ) = 
G(C(z)) > 0 for all z e {0,1} C . 
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(=>) Note that a special case when g(z) = 
n s6 B^a 6A (l “ z s) and A n B = 0, A, B c c is 
proven in [58, Lemma 2]. Here is a different general 
proof. 

Since g is non-negative, it can be represented as a 
posiform [6, Proposition 1]: 

9( z )= aB E[ zs n t 1 - 

Bee SGB SG C\b 

where a B = g( 11 B ) ^ 0 and 11 B e {0,1} C is the indicator 
of the set B defined as: (H B ) S := [seb]. By linear 
combination property (S3), G(C) can be written as 

G( C) = 2 2 (-1) |D| Cd.b, (89) 

Bee dcc\b 

which is a non-negative combination of non-negative 
summands, as ensured by constraints of Z c . □ 

Lemma 2.12 (Convex hull). Polytope Z c equals the 
convex hull 

conv jea e M 2 ° | £(z) D = ]~^[z s , Vd c C,Vze {0,1} C J. 

SGD 

(38) 

Proof. Let 77 denote the convex hull (38). Clearly, 
any vertex of P is in Z c and therefore P cr Z c . 

Let G(C) = 2 D cc «dCd ^ 0 be a facet-defining in¬ 
equality of P. Let us show it holds for all £ e Z c . 
Let g(z) = 2dcc “d ELsd z g R 1}°> a multilinear 
polynomial corresponding to G. For all vertices £( 2 ) 
of P there holds G(C(z)) = g(z) and at the same time 
G(C) ^ 0. It follows that g(z) ^ 0 for all z e {0,1} C 
and, by Lemma 2.13, G(() ^ 0 for all ( e Z c . We 
have proven that Z c c 77. □ 

Lemma 2.15 (Product). For f e Z c there holds Cf 2 e 
Zq , where the product ( 2 = (( is component-wise. 

Proof. Using the convex hull property we can repre¬ 
sent ( £ Z c as convex combination of vertices, he., 

Cd = YIk= 1 a k rissD D c °, where zk 6 R i} 0 for 

k = 1, • • •, n and ^ 0 and Xifc=i a k = 1- Then 


n n 



Ilf 

(90) 

ki = lk 2 = l SGD 

n 

SGD 

(91) 


ki,k 2 = l sGD 


where z kl,k2 e {0,1 } c is the coordinate-wise prod¬ 
uct of z kl and z k2 . Note that ak 1 otk 2 ^ 0 and 

k ,2 _ 1 a k\ a k -2 = 1- Expression (90) proves that (f 2 

is representable as a convex combination of vertices 


and thus belongs to Z c . One could similarly show 
that for (, g e Z c their product fg also belongs to 
Z 0 . □ 

A.3. LI construction 

In §2.7 we used representation of coefficients 
Pc,xc,x' c the ^ near extension [p^] in the form of 
a polynomial (39). This representation is obtained as 
follows. Starting from definition (36), we express: 

P C,x c x' c = n ((RR^ill - [ya=*J)Ca,*{, + foj8=Xs\) 
sec 

= Cc jB (xc, x D ) ^ Cs,x' s i (92) 

Dec SGD 

where 

, / \ j l^ s =Xs\ ~ [l/ a =®s], se D 

c D (x s , xA = < 

s) |[y s =x s ], s e c\d (93) 

C C , n( x c,x' D ) = J^Jcd^s,^). 

sec 

Theorem 2.16. Inequalities (41c) in the definition 
of polytope Z can be equivalently replaced with ^ 
0. 

Proof. The fact that inequalities (41c) imply Pq ^ 0, 
assuming equality constraints (41a)-(41b), was shown 
in §2.7. We show now that Pq ^ 0 implies inequali¬ 
ties (41c). 

The inequality P^ ^ 0 for the linear mapping Pq 
means that all its matrix elements are non-negative, 
i.e., the defining coefficients P CXcX ' for C e £, x c e 
X c , x' c e X c are non-negative. Let us detail the con¬ 
straint 

Pc,x,,x' <: > 0. (94) 

Let 

A = {s e c | x s ¥= x' s }, b = {s e c | x s ¥= y s }- (95) 

and let A denote the complement in c. From (93) we 
have 

1, s e a n b n d, 

—1, s e A n B n D, 

c B (x s ,x' s ) = < 0, s e (aAb) n D, (96) 

1 S G B fl D, 

0, S G B n D. 

V ' 

Coefficient c D (x c ,x' c ') in (93), which is the product 
of (96) over sec, expresses as 

c D (x c , x' c ) = (_l)|AnInD| 0 |D\(AAB)| 0 |B\D|_ ( 97 ) 
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It is non-zero only when d\(aAb) = 0 and b\d = 0 
or equivalently 

bcd, dc: aAb; (98a) 

BCD, A C B, DCAuB. (98b) 

Using sets A, B and coefficients (97) we obtain 

JWo= X (- 1 ) |DNB| Cd,*' d ^ 0 (99) 

D 

BCDCAuB 

which is equivalent to 

S(- 1 ) N W.^0- ( 10 °) 

DCZA 

In order to obtain inequalities (41c) we need to show 
that for all x' c £ X c , by varying x c e X C) the set B 
ranges over all subsets in c while at the same time A 
equals to B. Since x' c £ X c we have x' s A y' s for all 
s £ C. An arbitrary given set B can be realized by the 
choice x s = x' s if s £ B and x s = y s otherwise. At the 
same time for this choice of x there holds A = B. □ 

Lemma 2.17. For (eZ there holds P^ = Ppi. 

Proof. Let us calculate the expression of ( P 2 )c,x c x"■ 
It is equal to 

X! Pc.x<jx' c Pc.x',x't = X X (101a) 

x' c eX c Dice d 2 cc 

2 ( ioib ) 

x' c eX 0 sec 

The expression in fine (101b) factors as 

( x ( 102a ) 

SGDi 

( n X C D, (a, x()c l)2 (x y s , x")) (c,) 2);c " 2 ) ■ 

sgc\di x' g eX s 

(102b) 

The term of the second factor for s £ c\Di equals 

• Case s f Di, s £ d 2 : 

X hs=Xs}(-ly s =x ['] + [x"=x'J) = 0. (103) 

X's 

• Case s f Di, s f d 2 : 

Xl^ =x *][j/* = ®sl = lvs=x a \ (104) 

It follows from (103) that for D 2 D i the factors 
vanishes and hence expression (101b) vanishes. In 
the first factor the coefficient c Dl (x s , x' s )c U2 (x' s , x" s ) 
expresses as: 


• Case s £ Di, s £ D 2 : 

(-lys=Xa\ + [4=X S ])(-[^=X’'J + [x"=x'J) 

= (-[j/s=*s] + \_ X "s= X S II) \ x 's =x "s II • (105) 

• Case s £ Di, s f D 2 : 

(-[ya=Xs\ + [x'=x s ])[y s =x'J 
= ~lys=x s =x' s J + lx' s =x s =y s J = 0. (106) 

Therefore, if Di cj; D 2 , for each value of x[ n the prod- 
uct E[se Dl c Dl (x s , x' s )c D2 (x' s , x" s ) vanishes and hence 
the sum (102a) and the expression (101b) vanish. It 
follows that we need to count expression (101b) only 
for the case Di = D 2 =: D. In this case, carrying the 
summation over xf Q in (102a) we obtain 

+ [x"=xJ)C Di a;"- (107) 

SGD 

From the full expression (101) there remains 

X + l^s= x s\) El I y - S = x JCd,A'Cd,x" 

DCC SGD SGC\D 

= X Cc ’ D ( x ci x c)Cc,x" = (P( 2 )c,x c ,x"- (108) 

DCC 

The claim (P() 2 = P^ 2 is proven. □ 

A.4- Maximum Persistency for Local Relaxations 
Lemma 3.1. Node-wise mapping [p] preserves the 
local polytope A. 

Proof. Clearly, [p\ preserves non-negativity. Let // 
satisfy marginalization constraint (52) for some c £ £, 
Dec and x D £ X D . Then 

X»)o(*c) - X X bc(x' c )=x c ]Ac(x' c ) 

x c\d XqSXq 

= X ( X bc(x' c )=x c ])p c (x' c ) (109a) 

Xq X C\D 

= X lPv(x' D )=x B ]ii c (x , c ) (109b) 

x'c 

= X bn(x' D )=x D ]/i D (x' D ) = (b]p) D (x D ). (109c) 


Lemma 3.2. Mapping P^ for f £ Z preserves the 
local polytope: P^(A) c A. 

Proof. We need to prove that (Vp £ A) 


{P(r)i2 — i; 

(110a) 

PqL > 0; 

(110b) 

AP^p = 0. 

(110c) 
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Constraints (110a), (110b) are satisfied for any relax¬ 
ation polytope (same as (43)). Constraint (110c) is 
local: for each c cr £ it is given by equalities (52) 
for (Pp) c , and hence involves only (£ D | D cz c). This 
constraint holds for all integer Q by Lemma 3.1 and 
thus also for all s Z. □ 

Theorem 3.3 (Characterizations). For a local relax¬ 
ation A all of the following are equivalent: 

(a) PeWf, 

(b) (3 (p e R m ) /*> - P T f ^ 0; 

(c) (3^r) (I - P T )fv ^ 0; 

(d) inf {</ -P T f,p) \ fie R x + , A(I — P)p = 0} = 
0. 

Proof. We already have (a) (b) by Theorem 2.8 

and (a) => (d) (c) by Theorem 2.10 (equivalence 

is the duality relation discussed in the proof). Let 
us prove (c) => (a). Let |ie A. We multiply com¬ 
ponent inequalities (c) with non-negative numbers 
Pc(x c) and sum over x c and C. We get inequality 

<P T /, p) - (P J A T p, p) ^ </, if) - <A V p). 

The sum (A T g>, p) = (p, Ap) vanishes because Ap = 
0 and similarly ( P T A T ip , p) = ( ip , APp) = 0 since by 
Lemma 3.1 Pp e A and thus APp = 0. The remain¬ 
ing inequality proves that P e W/. □ 

A.5. NP Hardness Results 

Theorem 3.4. Problem (max-wi) over the V 1 class 
of maps and the BLP relaxation is solvable in polyno¬ 
mial time for the quadratic pseudo-Boolean case and 
otherwise (when the problem is multilabel or higher 
order) it is NP-hard. 

Proof. In the quadratic pseudo-Boolean case the so¬ 
lution is given by the optimal relaxed labeling with 
the maximum number of integer components. For a 
more special case of vertex packing problem it was 
proven polynomial by Picard and Queyranne [46], 
whose proof we extended in [55, Statement 5]. For 
the general quadratic pseudo-Boolean case, the solu¬ 
tion can be found efficiently by analyzing connected 
components in the network flow model [7], [32, §2.3]. 

Next we prove that (max-wi) is NP-hard if either: 
there are more than two labels or the order of the 
problem is 2 (cubic terms) or higher. We use re¬ 
duction from pairwise constraint satisfaction problem 
(CSP) which is NP-complete when variables can take 
3 or more values or when constraints can couple 3 or 
more variables at a time (this case includes 3-SAT). 


This problem can be represented as energy minimiza¬ 
tion with constraints f c : X c —> {0,1}. The CSP is 
satisfiable iff the minimum of the energy is zero. Let 
V be the value of the LP-relaxation. Then either it 
is larger than zero and in this case the CSP is not 
satisfiable or it equals zero. In the latter case, the 
CSP is satisfiable iff there exist an integer solution 
with cost zero, i.e., the relaxation is tight. This is 
the case iff (max-wi) determines all variables as per¬ 
sistent. Thus if (max-wi) was in P we could solve 
CSP, which is a contradictions. □ 


Theorem 3.5. Problem (max-si) with 4 or more la¬ 
bels over the class of maps V 2 = u yexV 2 * and BLP 
relaxation is NP-hard. 


Proof. We again use a reduction from CSP with 3 
labels, X s = {1,2,3}. Let the constraints of CSP be 
defined by g c ■ X c —* {0,1}. We construct an energy 
minimization problem with 4 labels as follows: 


fc( x c) 


Sc(x c ), x c £ {1, 2, 3} c ; 

< £ x c = 4 C ; 

B , otherwise, 

v ‘ 


( 111 ) 


where B > |£| and £ < 1/|£|. Let V be the value 
of the BLP relaxation. If it is larger than zero, then 
the CSP is not satisfiable. Otherwise, the relaxation 
is tight iff the CSP is satisfiable. It is clear that if 
an integer solution of zero cost y* exists, it must 
take values in {1,2,3}. If it exists, then mapping 
p s : (1,2,3,4) i—> (1,2, 3, y*) is strictly relaxed im¬ 
proving as it replaces components of cost e > 0 with 
components of cost 0. Let q be the maximum strict 
relaxed improving mapping in class V 2 . Let x s = 4 
for all s e V. If (3.s e V) q(x) s = 4, then the CSP is 
not satisfiable, as q is not larger than p. Otherwise 
(Vs £ V) q(x) s A 4 and Ef(q(x )) < |£|e < 1. It fol¬ 
lows that g(q(x)) = 0 and hence q(x) is a solution to 
CSP. We showed that CSP was reduced in polynomial 
time to (max-si). □ 


A.6. Method of Swoboda et al. 

Theorem 4.5. Persistency by method [64] in the 
pairwise multilabel case corresponds to an FLP- 
inrproving mapping. 

Proof. The method constructs a subset A cz V, a 
labeling y on A and an auxiliary energy E g defined 
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by: 


(VseM) g s = f s , (112) 
(Vsf £ £, s e A, t £ A) g st = f s t, 

(Vst e £, s £ A,t $ A, Vij) 

[ma Xfext i = Vs, , 11Q x 

9st{h3) = ) . j> / • */\ . , ( 113 ) 

(minj/g^ fst\ l i3 )i i=£ys, 

with remaining terms set to zero. It can be seen that 
energy E g depends on the assignment of y only on the 
boundary dA = {s e A | (3f £ V\A) {s,t} £ £}■ Let us 
extend y to V in an arbitrary way, e.g., by yv\^ = 0. 
The sufficient condition of [64, Corollary 1] implies 
that 6(y) £ argmin /i6A (g, g) (the relaxation is tight). 
We construct mapping p as 


Ps(i) 


y s if s £ A, 
i if s $ A, 


(114) 


i.e., p replaces part of a labeling x on A with the 
labeling y. This mapping is illustrated in Figure 11. 
We claim that mapping p is relaxed-improving. 

We first show that g is auxiliary for /, i.e., that 

(\/y e A) ((I~P) T f,g)>((I-P) T g,g). (115) 


We trivially have f s {i) - f s (p s (i )) = 9s(i) ~9s(Ps(i))- 
We also have equality of pairwise terms 


fst(i,j) - fst(Ps(i),Pt(j)) = 9st(i,j) ~ 9st(Ps(i),Pt(j)) 


for st £ £ in all of the following cases: 

(a) s £ A and t £ A\ 

(b) s $ A and t $ A; 

(c) s £ A and t £ A, i = y s - 

It remains to verify the inequality for boundary pairs 
s £ A, t ^ A in the case i ¥= y s - We have 


) ~ fst(Ps(i),Pt(j )) 

^ min ( fst(i,f ) - fst(p s (i),Pt(j'))) 

1 ' , , (116) 
^ mm/*(«, j ) - maxf,t(y„p t (j )) 

3 3 

= 9st(i,j) ~ 9st(Ps(i),Pt(j))- 
Because component-wise inequalities hold it follows 
that (115) holds. 

The second step is to show that p is relaxed- 
improving for g. By assumption, we have S(y) £ 
argmin ^ e \(g, g). Given a labeling x, mapping p re¬ 
places part over A to the optimal labeling y. It fol¬ 
lows that (V> e A) < g,Py > = (g,5{y)) < (d,g)- 
Combining this inequality with (115), we obtain that 
[p] £ W f. □ 


A. 7. Quadratization Techniques 

In this section we define a sufficient set of atomic 
reductions to represent methods [22, 15]. We first 
define a rather general reduction. 

Definition A.l. An injective reduction is a proce¬ 
dure that for a given energy minimization problem 
described by V, £, X , / specifies: 

• The reduced energy minimization problem de¬ 
scribed by V', X', /'; 

• An injective mapping it: X —> X '; 

• A left inverse of 7r, mapping a: dom(cr) —» X, 
where X' zd dom(<r) 3 tt(X). 

The energies are related by Ef( tt(x)) = Ef(x) for all 
x £ X. 

Mapping ir establishes a correspondence between 
labelings that preserves distinctness. Its left in¬ 
verse always exists but may be non-unique (when 
7 t(X) =£ X'). For some labelings in X' there may 
be no meaningful correspondence in X. For this rea¬ 
son the domain of a is allowed to be specified. We 
introduce the following atomic injective reductions. 
First, the ones not changing the hypergraph: 

• reparametrize: Let /':=/ — A T <p, where A is 
the matrix of a local relaxation. Mappings it and 
a are identity. 

• permute_labels: Apply a permutation (a bijec- 
tion) of labels: for each s £ V, mapping ir s : X s —> 
X s is a bijection, X’ s = X s and a s = nj 1 . The 
reduced energy is f c (ir(x c )) = / c (x c ). 

• add_labels: Expand variable domains as X s a 
X’ s and extend the objective component-wise: let 
fc(x' c ) = fc(x' c ) f° r x c £ 'L’c and arbitrary for 
x' c £ Xjj\X c . Mapping m: X —> X' is the injec¬ 
tion x i—» x. Mapping cr: X —> X is the identity 
(dom(er) = X). 

Now, atomic reductions changing the hypergraph: 

• remove_zero: If the term / c (x c ) = 0 for all x c £ 
X c , exclude C from £: let £' = £\{c}. Mapping 
7r is the identity. This operation is the converse 
of adding zero interactions in [72], 

• clique_aux: For a given C £ £, introduce a new 
node li, an auxiliary variable y £ X u and repre¬ 
sent the term f c as 

fc(x c )=minfc UUJ (xc,y). (117) 
y 

Define V' = Vu {w} and 

£’ = £ u {d u {w} | D £ £, Dec}. (118) 
For hyperedges D £ £\{c} the energy terms 
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are copied: f' D (x n ) = / D (x D ) and the remaining 
terms of f (other than in (£\{c|) u {c u {cj}}) 
are zero. The new labeling domain is X' = 
X x X u . Mapping tt: X —> X' is defined as 
7r: x i—» ( x,y(x )) where 

y(x) = y(x c ) e argmin f' Cyj , w x{x c ,y). (119) 
y 

Mapping a: X' —*• X is the restriction (x,y) >—> 
x. 

• group_aux: Let "H x £ and intersection C = 
Hne-H H e £■ Introduce a new node oj and a vari¬ 
able y e X u . Let function y:X c —> X u be such 
that (Vh e PL ) 

fn{x H ) = f^{ u} {xu,y{x c )) = min f^ { Xx H ,y). 

yE/Ccj 

Define V' = Vu {w} and 

£' = £ u |J {d u {w} | D £ £, D x h}. (120) 

neV. 

Define tt: X —> X' : x >—» ( x,y(x c )) and a: X' —> 
X ■ (x,y) ^ x. 

In order to relate relaxed-improving maps before 
and after the reduction we need a correspondence be¬ 
tween them. However, not all maps p: X' —*■ X' of the 
reduced problem make sense for the initial problem. 

Definition A.2. A node-wise mapping p': X' —> X' 
is admissible for a reduction if p'(tt(X)) cz dom(cr). In 
this case p = a o p' o n is the corresponding mapping 
for p'. 

All the above atomic reductions fulfill the require¬ 
ments of Definition A.l and, with the exception of 
add_labels reduction, all mappings p': X' —> X' are 
admissible. 

Definition A.3. A reduction has inclusion of 
relaxed-improving maps for relaxations A and A' if 
for every admissible A'-improving mapping p' for f 
its corresponding mapping p is A-improving for /. 

Theorem A.4. Atomic reductions reparametrize 
and permute_labels have inclusion of relaxed- 
improving maps for local relaxations. 

Proof. Let A be a local relaxation. Let p' be 
A-improving for f. In case of permute_labels, 
marginalization constraints are invariant w.r.t. the 
order of labels. We trivially get that o o p' o tt is 
A-improving for /. In case of reparametrize, there 
holds (/ — A T <p, y) = (/, y) for all y e A as well as 
for all y £ [j/](A) x A and so p = p' is A-improving 
for /. □ 


To a hypergraph (V,£) let us associate an induced 
local relaxation A £ as follows. For A, B c V let A c b 
if A, B £ £ and A £ B. Let A^ be the local relaxation 
for the coupling cz. 

Theorem A.5. Atomic reductions remove_zero, 
add_labels have inclusion of admissible relaxed- 
improving maps for local relaxations A^ and A' = 
A s'- 

Proof (remove_zero). Because £' does not contain 
hyperedge C, all variables y c , as well as the asso¬ 
ciated marginalization constraints, are absent in Ah 
Polytope A' can be still represented in as having 
unconstrained variables y' c {x c ). In this representa¬ 
tion A ix A' and Theorem 4.1 applies. □ 

Proof (add_labels). Let p' be relaxed-improving for 
/': 

(VeA ')</'bV)«/'A (121) 

By the dual representation Theorem 3.3(b), there ex¬ 
ists dual multipliers if for A' such that 

(Vc e £, Mx' c e X' c ) f' c {p'{x\ ' )) ^ f^(x' c ). (122) 

It must be that p' s (X s ) x X s , otherwise p' is not ad¬ 
missible. The corresponding mapping p s : X s —> X s 
is the restriction of p' s to X s . Restricting (122) to X s 
we obtain (VC e £, Mx c e X c ) 

fc(p(x c )) sS fc(x c ) - ((A / ) T ^) c (a:c). (123) 

The RHS expression is of the form (55) and thus is a 
valid reparametrization of / in A. Therefore (123) 
satisfies component-wise inequalities and by Theo¬ 
rem 3.3(b), p is relaxed-improving for /. □ 

Theorem A.6. Atomic reductions clique_aux and 
group_aux have inclusion of relaxed-improving maps 
for local relaxations A^ and A' = A £/. 

Proof (clique_aux). First, let us show that a relaxed 
solution fie A can be mapped to a relaxed solution 
of the reduced problem /i' e A'. Let 

(Vd £ £) (Mx v e X D ) y' B {x D ) := y D {x D ), (124a) 
(Vd x c, Def) (Vx D £ X D , My £ XJ) (124b) 

Mdv{u}( x d >V) : = 2 A fc c(^c)[2/(^c)=2/]|- 

x c\d 

In particular, y' Cu{u)} (x c ,y) = y c (x c )ly(x c )=y ] and 
(y) = T, Xc Vc{x c )ly{x c )=y}- Clearly, solution y' 
satisfies all those marginalization constraints that y 
does. The additional marginalization constraints of 
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kg/ are given for Be: c, Ae b, A,Bef by couplings 
A, B, A u {w} £ B u {<u}. (125) 

Marginalization for B and B u {w}: 

,}{ x ^v) = X X M x c)ly{x c )=yJ 

y y x c\b 

= X Pc( x c) = Pb(x b ) = y' b(xb)- (126) 

^cXb 

Marginalization for A and B u {cj}: 

2 MbuM^b ,y)= X! X Mx c )l;y(xc)=y} 

x b\a ->y x b\a^ x c\b 

= X Vc( x c) = Pa(x a ) = v'a(xa)- (127) 

x c\a 

Marginalization for A u {a;} and B u {cj}: 

X /‘buM^b ,y) = X X Vc(x c )ly(x c )=y] 

x b\a *^b\a x c\b 

= X ^c( x c ){y(x c )=yj = g' Au{(j) (x A ,y). (128) 

X C\A 

Therefore y' £ A'. Since we have the equality 
fc(x c ) = f C u{ u }( x c,y{xc)), there holds 

X/o(*cW*c) = X^cum( x c,J/(*c))Mc(*c) 

X C X C 

= X fL{u } }( x c,y)ly=y{x c )}yc(x c ) 
x c ,y 

= X /cuM^c,!/)^^}^,!/). (129) 

x c ,y 

For other components D e £\{c} we have 

X /d(3tVd(zd) = X /d^d)^^) (130) 

and for all D c c, d e f we have f' , , = 0. Let 

us denote the linear mapping (124) by n. 

It follows that for all y £ A there holds liy £ A' and 
from (129),(130) that 


(f,y) = (f',Uy). 

(131) 

This proves that 


min (f,y) ^ min</',//>, 
fieA n gA' 

(132) 


i.e., relaxation A' is not tighter than A. 

Let p': X’ —> X' be node-wise AMmproving for f 
and p = crop'on, i.e., p s = p' s for s £ V. Let P' = [j/|, 
the extension to . Similarly to (129) and (130) we 
can express parts of the scalar product 

</> Mm> = X X foiPD(x D ))y B {x B ) (133) 

DES Xy)E:PC d 


for each D £ £ as follows. For D e £\{c} we have 

X/o(^(-o))Md(x d ) (134) 

X D 

x D x D 

= X/d(^d)(-P'm / )d(5d). (135) 

x D 

For c let c' = C u {cu}, we have 

X fc(Pc( x c)) Pc( x c) (136a) 

X C 

= YYf^c)lx c =Ax c )}y c ( x c ) 

X C Xc 

= XX^c'( 5 c,y(^c))[5c=p , (®c)]Mc(^c) (136b) 

X C x c 

«ss fc'(xc,pl{y(xc)))lx c =p'(x c )}y c (x c ) 


Xc Xc 

= X _X fc'(xc,y)lxc=p(x c )}ly=p LiJ (y)l 

xc,yx c ,y (136c) 

ly=y{x c )}pc{x c ) 

= X X fc'(xc,y)K',(xc,m^,y)P'c'( x c,y) (136d) 

x c,y x c ,y 

= _X fcu{u}( x c,y){P'y')cu{u}{x,y). (136e) 

x c ,y 

The inequality is due to y(x c ) £ argmin y f' c ,(x G ,y). 
For all D £ c, D £ £ there holds = 0. It 

follows that 

</, Mm> = </', [p']p') (137) 

< (f i P) = </'> n M> = </, M>- 
Therefore [p]eWj. □ 


Proof (group_aux). Because y(x c ) minimizes all 
group terms simultaneously, arguments of the proof 
for clique_aux apply to each H e 77. It follows that 
the final relation (137) is satisfied. □ 

Theorem 4.6. Persistency by Higher Order Clique 
Reduction (HOCR) of Ishikawa [22] corresponds to 
an FLP-improving mapping. 

Proof. Reductions used in HOCR are of the following 
form: 

xix 2 ■ ■ ■ x n = min g(x, y), (138) 

yi,y2,—yk 

where function g(x, y) is by design partially separable 
so that the reduction decreases the order of the prob¬ 
lem. This reduction can be implemented as follows: 

• Use clique_aux to introduce new variables 
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yi,...yk, one at a time, e.g., first let 
xix 2 ...x n = min^j / 1 (x,yi), where f l {x,yi) = 
minj/2,—j/fc 9{x,y), and so on. 

• Use reparametrize to rewrite the term g(x, y) 
that gets assigned to the hyperedge over all vari¬ 
ables x\,... ,x n and y \,..., yk as a sum of terms 
over smaller subsets of variables as guaranteed 
by design of g. 

• Use remove_zero to clean up the hypergraph 
so that all now zero higher order terms are ex¬ 
cluded. 

HOCR full reduction can be implemented by iterat¬ 
ing the steps above. Starting from the FLP relaxation 
of the initial problem, the necessary reparametrize 
operations are feasible as we add all possible inter¬ 
actions during clique_aux. Each atomic reformu¬ 
lation has inclusion of relaxed-improving maps. It 
follows that the whole reformulation has inclusion of 
relaxed-improving maps. The persistency in the final 
reformulation is obtained with QPBO, which by The¬ 
orem 4.3 is an FLP-improving mapping. It follows 
that there is a corresponding FLP-improving map¬ 
ping in the initial formulation. □ 

Theorem 4.7. Persistency by method of Fix et al. 
[15] corresponds to an FLP-improving mapping. 

Proof. Method [15] uses reductions of HOCR and 
a new reduction for a group of cliques [15, The¬ 
orem 3.1]. Let us show that the group reduc¬ 
tion has the form that can be implemented using 
group_aux, reparametrize and remove_zero. The 
optimal value of the auxiliary variable y in [15, The¬ 
orem 3.1] depends on the assignment of x c only: it is 
given by y(x c ) = 1 — Y\jec x r Since all hyperedges 
H ePi contain c, the reduced expression [15, equation 
(2)] equals 

2 an(y(x c )Yl Xj + (1 - y(x c )) n y). (rn 

hs H jec jeh\c 

where a H > 0, i.e., it has the form required by 
group_aux. The actual simplification of the prob¬ 
lem is achieved by applying reparametrize and 
remove_zero similarly to HOCR. 

It follows that reduction [15] can be imple¬ 
mented using operations clique_aux, group_aux, 
reparametrize and remove_zero. □ 

A.8. Bisubmodular Relaxations 

The section is organized as follows. We review defi¬ 
nitions and the persistency property of bisubmodular 


functions. Then we define the sum of bisubmodular 
functions relaxation via injection of label space {0,1} 
into {0, ^, 1} (with operation add_labels). The com¬ 
parison result is then achieved in two steps. We show 
that persistency for a bisubmodular function is a 
BLP-improving mapping on 3 labels ({0,^,1}) admis¬ 
sible for add_labels and then apply the result that 
add_labels has inclusion of BLP-improving maps to 
transfer all persistencies by bisubmodular relaxations 
to the BLP-relaxation of the initial problem. 

We follow the notation of [28]. A sum of bisub¬ 
modular functions (SoB) relaxation is constructed as 
follows. Let X s = {0,1} and 

/C 1/2 = {0, 1} V . (140) 

Define binary operations n, u: /C 1 / 2 —*• /C 1 / 2 
component-wise according to: 


n 

0 

1 

2 

1 

u 

0 

1 

2 

1 

0 

0 

1 

2 

1 

2 

0 

0 

0 

1 

2 

1 

1 

1 

1 

i 

0 

1 

1 

2 

2 

2 

2 

2 

2 

1 

1 

2 

1 

2 

1 

1 

1 

2 

1 

1 


Definition A.7. Function /: /C 1 / 2 —> M is called 
bisubmodular if 

(Vx, y e /C 1/2 ) f(x n y) + f{x u y) < f{x) + f(y). 

(142) 

Definition A.8. Function f : /C 1 / 2 —> M is a sum 
of bisubmodular functions relaxation (SoB relaxation) 
for /: X —> M if for every C £ £ there holds: 

• /,{: K}J 2 —*• M is bisubmodular; 

• (Ve e £, Vx c e X c ) f' c {x) = f c (x); 

The next lemma reviews persistency according to 
[28, Proposition 12]. 

Lemma A.9. Let x* e /C 1 / 2 be a minimizer of f. 
Define the following mappings 

p' s : K}J 2 —> KfJ 2 : x s >—► (x s u x*) u x*; (143a) 

f '7** it '7** 1 • 

>.V,:x,>> f s ^ 2 ’ (143b) 

lx s , otherwise. 

There holds 

(a) Autarky: (Vx £ /C 1 / 2 ) /'(p'(x)) </'(x); 

(b) p'(X) cr X and p is the restriction of p' to A. 

Proof. Part (a) follows from bisubmodularity (142) 
using that x* is a minimizer: for all y e /C 1 / 2 there 
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holds 

f\y u x*) sS ( f'(x *) - f'(y n x*)) + f'(y ) sj f(y). 

(144) 

It follows that (Vx e /C 1 ' 2 ) 

f'((x u x*) u x*) ^ /'(x u x*) ^ /'(x). (145) 

Part (b) can be verified explicitly by calculating the 
tables for mappings p and p'. They are represented 
respectively as: 


Ps{x s ) p' s (x s ) 


x s \x* 

0 

\ 1 

x s \x* s 

0 

1 

2 

1 

0 

0 

0 1 

0 

0 

0 

1 (146) 

1 

0 

1 1 

1 

2 

0 

1 

2 

1 




1 

0 

1 

1 


It is seen that for non-fractional x s the results match. 

□ 


We now upgrade the autarky property by 
Lemma A.9(a) to the statement that p' is BLP- 
improvinf for /'. In order to show this, we construct 
a reparametrization in which the improving inequal¬ 
ity holds component-wise. This reparametrization is 
given by an optimal dual point, provided that it pre¬ 
serves bisubmodularity of all components. The next 
Lemma shows such a dual solution exists. 

Lemma A.10. Let /: /C 1 / 2 -^Ibea sum of bisub- 
modular functions. BLP for / admits an optimal dual 
solution ip such that each component 

fc (x c ) = fc(x c ) - (A T 99 ) c (x c ) (147) 

is bisubmodular. 

Proof. The plan of the proof is as follows: 

• Reformulate bisubmodular function /: /C 1 / 2 —► 
R as a function g : B 2V —> M. 

• Show there exist a symmetric optimal dual solu¬ 
tion to this BLP. Such symmetric solution defines 
a reparametrization which preserves the sum of 
bisubmodular functions property for g and con¬ 
sequently for /. 

The following reformulation of bisubmodular function 
/: /C 1 / 2 —> M as a function g: B 2V —> M is according 
to [28]. Node s is represented by a pair of nodes 
(s,s'). Label x s e /C 1 / 2 is represented by a pair of 
labels (u s ,u' s ) as follows: 

0 — (1,0); 1—>(0,1); ^ —*• (0, 0). (148) 

To a hyperedge C there corresponds hyperedge Cue', 


where c' = {s'|sec}. In the hypergraph (V u V', G) 
there are edges of the form C u c', {s} and {s'} for 
C e £ and s e V. We will denote components g Cxj <y 
simply by g c for |c| ^ 1. The energy g is defined by 

(Vu e B cxc ') g c (u c , u c >) = f c (^ Uc + Uc ' ^j , (149) 

where u := 1 — u. By definition, g has the symmetry 
property: 

(Vm e B cxc ') gc{ u c,u c i) = g(u c i,u c ). (150) 

Let X s ~ = {(u s ,u s >) e B 2 |(w s ,it s /) ^ (1,1)} and 
X~ = n s ev' : ^s _ - For u,v e X~ operations u, n 
are defined by 

u n v = u a v, (151a) 

u u v = reduce(rt v v), (151b) 

where reduce(rc) is the labeling obtaining from w by 
changing labels ( w s ,w s i ) from (1,1) to (0, 0) for all s e 
V. It can be seen that these definitions are consistent 
with equations (141). Components g c satisfy 

(Vu c, v c e X~ ) g c (u c n v c ) + g c (u c u v c ) 

^ 9c(uc) + 9c(vc). 

With this reformulation we proceed as follows. Let 
ip be dual optimal to g. According to the hypergraph 
(V u V',G), there are only components ip s> cu c'(*), 
ip s >, cuc'W f° r C e £, |c| > 1, s e c, and i e B and 
components ip & tCuC / for C e £. Similarly to function 
g , we can denote the index c u c' as just C. 

If <P 0 ,c A 0 for some C e £, we can apply the trans¬ 
formation g c := g c - <^ 0iC ; g 0 := g 0 + ip 0>c . Clearly 
this constant transformation does not change bisub¬ 
modularity of g c . Without loss of generality let us 
assume now that ip 0 ^ c = 0 for all ceL 

Dual feasibility of BLP relaxation for each compo¬ 
nent Cue' where C e £, |c| > 1 reads (V« CuC /) 

g c (u c ,u c >) - ^ (Ps,c(u s ) + <p s ',c(u s ')) ^ 0. (153) 

sec 

And for unary terms (i.e., C = {s}), it is (Vw{ S)S q) 

g s (u s ,u s ’) + ^ (<Ps,d( u s') + <Ps',d(u s )) ^ 0. (154) 
ds{s} 

By symmetry of g, condition (153) is equivalent to 
(Vu cue') 

gc(u c /,u c ) - (Ps,c(u s ) + </V, c(u s ')) > 0 (155) 

SEC 

and, by flipping all bound variables u C uc'> t° (V« CuC /) 

gc(u c ,uc) - Yi {‘PsAus’) + <Ps',c(u s )) > o. (156) 
sec 
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Similarly, for unary terms there holds (Vu{ S)S /i) 

9s(u s ,u s >)+ 2 (‘Ps,o(u s ') + <p s ',d(u s )) >0. (157) 
ds{s} 


Therefore the dual point ip' with the following com¬ 
ponents is feasible: 


<Pa, c(®) := ¥V,c(»). 
<p's>, c(») := ^,c(») 


(158) 


for c e £, |c| > 1, s e c, i e B (components ip' 0 c = 0 
for c e £ are omitted in this and subsequent steps). 
Solution ip' is clearly optimal to BLP of g since 

ip in' 

90 ~ 9<z ~ 90 ■ 


Therefore the following symmetrized solution is opti¬ 
mal: 


Vs ,cW : = ^(^s,c(ws) + <Ps',c( u s )^, 
<Ps’,c(u s >) :=~ (^Ps',c( u s') + <P8,c(u s ')y 


(159) 


Let us check it is bisubmodular. We map this dual 
solution back to BLP for / by reversing (149) as 


A,c(0) :— <£ s ,c(l) + A',c(0) — <^s,c(0) + <£V,c(1)j 

A,c(l) := <A,c(0) + <A',c(l) = V 9 s,c(l) + 7V,c(0), 

<Ps, c(^) := <fs, c(0) + <^ s ',c(0) (160) 

= 2 (^s,c(0) + ^ s ,c(l)) + + ¥V,c(l)) 

It satisfies (p s ,c{\) = ^( < Ps,c(0) + <A,c(l)) and there¬ 
fore both <p s>c and —<p s ,c are bisubmodular. 

It remains to show that <p is optimal to BLP of 
f. While it is well known that BLP relaxation is 
tight for SoB function f : /C 1 / 2 —> M, e.g. [66], it is 
not obvious that BLP relaxation for the reformulation 
g: B 2V —> M is tight as well. Let us show this is the 
case. It will follow then that <p, constructed from an 
optimal dual solution ip to BLP of g, is optimal to 
BLP of /'. 


Statement A.11. The BLP relaxation for sum of 
bisubmodular functions g: B 2V —► M is tight. 


Proof. The schema of the proof is similar to e.g. 
[12, T.6.2] or [52] who considered sum of submod- 
ular functions. We construct a primal integer op¬ 
timal solution from an arc-consistent optimal dual 
solution. In the construction we will need that the 
reparametrized problem g v is a sum of bisubmodu¬ 
lar functions. Therefore we need an arc consistent 
symmetric optimal dual solution. 


We start by taking a pair (//, ip) that satisfies strict 
complementarity slackness for BLP of g. Since arc 
consistency is a necessary condition for strict comple¬ 
mentarity, ip is arc consistent. As was shown before, 
ip' defined by equations (158) is dual optimal to BLP 
of g. It is arc consistent because ip was arc consistent. 
By taking a symmetrized dual solution ip defined by 
equations (159) we obtain an optimal symmetric arc 
consistent dual solution. By symmetry, it preserves 
component-wise bisubmodularity. 

Let now tp := <p. We construct an integer solution 
as: 

i£ = /\{ieB|<tf(i) = 0}. (161) 

In order to show that u* is optimal we prove com¬ 
plementarity with ip. Let c e £. By arc consistency, 
for every sec there exists ujf^ such that ui^ = u* 
and gf, (u{f ) = 0. It also follows that (Vi e C 1 1 =£ s) 
gf(u ^) = 0 and hence u f s ' > > u*. It follows that 

u * = Asec u c ] an d by component-wise bisubmodu¬ 
larity we have that 

9c(/\uc S) ) = 0. 
sec 

From feasibility and complementarity slackness fol¬ 
lows u* is optimal. Therefore BLP relaxation for g is 
tight. □ 

It follows that ip constructed in (160) not only fea¬ 
sible to BLP of f but achieves the optimal dual ob¬ 
jective f( 2 = g% = min ueB 2 v E g (u). □ 

Lemma A.12. Mapping p' defined in Lemma A.9 by 
an optimal solution x* e /C 1 / 2 is BLP-improving for 
/'• 

Proof. Let ip provide a component-wise bisubmodular 
optimal reparametrization f ,<p for BLP of f which 
exists by Lemma A. 10. Since BLP for f is tight, <p 
and 8(x*) must satisfy complementary slackness. For 
every hyperedge C 6 £, c ¥= 0 by dual feasibility 
fc ^ 0 and by complementary slackness 

f7 (x*c) = 0 . ( 162 ) 

Therefore, labeling x* is a minimizer of f c f. Since 
fff is bisubmodular it follows by the same argument 
as in equations (144), (145) that for each hyperedge 
C we have the component-wise autarky property: 

(\/x c e ICT) fc(p' c { x c)) < fc{x c ); (163) 

By the characterization Theorem 3.3(c), mapping p' 
is BLP-improving for f. □ 
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Theorem 4.8. Persistency by SoB relaxation [27] 
corresponds to a BLP-improving mapping. 

Proof. Let p' and p be mappings defined in 
Lemma A.9, corresponding to the persistency [27]. 
We need to show that mapping p is BLP-improving 
for /. By Lemma A.12, p' is BLP-improving for f. 
The discrete relaxation f is obtained from / with 
add_labels which has inclusion of admisible BLP- 
improving maps (Theorem A.5). By Lemma A.9(b), 
p' is admissible and thus Theorem A.5 applies. □ 


A.9. Persistency in 0-1 Polynomial Programming by 
Adams et al. 

The 0-1 polynomial programming problem is the 
following optimization problem: 

min Yi cjY[xj, (PP) 

JcV jeJ 

where x is binary and coefficients cj e M are not nec¬ 
essarily all non-zero. The objective of (PP) is a mul¬ 
tilinear polynomial expression, which can be written 
uniquely for any polynomial in 0-1 variables apply¬ 
ing the identity x 2 = Xj. Adams et al. [2] considered 
a hierarchy of relaxations of Sherali and Adams [58] 
for this problem. A relaxation of level d can be con¬ 
structed assuming that for all \J\ > d coefficients cj 
are zero. 

To match their result we need the following: 

• A hypergraph (V,£), where £ contains all sub¬ 
sets of V of cardinality up to d (in case d = 2 it 
is a fully connected graph): 

£ = {J czV\\J\^d} 


• Represent the problem (PP) as energy minimiza¬ 
tion with terms 


fj(xj) 


Cj if Xj = 1J, 

0 otherwise. 


(164) 


• Consider the FLP relaxation. 

The persistency result [2, Lemma 3.2] can be de¬ 
scribed as follows. Their lemma partitions the set 
of nodes as V = N + u N~ u N?. A sufficient condi¬ 
tion is proposed implying that the partial assignment 
X N- = 0, x N + = 1 is globally optimal. 

We interpret their result in the dual decomposition 
framework. The hypergraph ( V ,£) is split into two 
parts: 

• Nodes Vi = N?, hyperedges £\ = {J e £ \ J ez 
Nf}. 

• Nodes V 2 = N + u N~ u B , hyperedges £2 = {J e 
£ | J cj; AT*}, where B is the following boundary 


set: 

B = {v e N f \ (3 J e £) v e J, J cf N f }. (165) 


It can be seen that hyperedges £\, £2 form a partition 
of £ whereas the sets of nodes Vi, V 2 overlap over B. 
Accordingly to the two hypergraphs we define the 
decomposition: 

f(x) = / VvJ + / 2 (Av 2 ), (166) 

where 


Z 1 : V2 °J X J * 

(167a) 

Je£i 


/ 2 : V 2 —> M: x >-» 2 cjxj. 

(167b) 

JeS 2 



If we found an optimal solution to J 1 and an optimal 
solution to / 2 and accidentally they were consistent 
over the overlap part B we would have obtained an 
optimal solution to /. 

We will show that the conditions of Adams et al. 
[2] imply that an arbitrary solution xy x to / 1 defines 
an optimal solution x' to f 2 given by 


x 


/ 

S 


x s , s e B 
< 0, s e N~ 
1, seN+. 

V ' 


(168) 


In other words, any extension of xy, to V 2 defines an 
optimal solution to / 2 . In fact, under these condi¬ 
tions, what happens inside the problem / 1 is irrele¬ 
vant. 

Recall that the relaxation [2] is obtained by intro¬ 
ducing a relaxed variable wj e [0, 1] in place of every 
product Usejxj- The constraints are represented by 
the following linear forms: 

fd(Ji,J2) = 2 (-1 (169) 

J'cJ 2 


for each J\nJ 2 = 0 and IJ 1 UJ 2 I = d. The relaxation, 
denoted LP(d/rj) is given by 

min ^ cjwj (170) 

JcV 

(VJi, J 2 | Ji u J 2 I = d, J\ n J 2 = 0) J 2 ) ^ 0. 


This relaxation can be matched to FLP by the rela¬ 
tion 


Pj(xj) = f d (Ji, J 2 ), (171) 

where J\ n J 2 = 0, J\ u J 2 = J, d = |J|, xj, = 1, 
xj 2 = 0. In particular, 

/xj(lj) = wj, (172) 

where lj is a |J|-vector of ones. For w feasible to 
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LP(d/n), the terms fd(>h , >h) are non-negative and 
satisfy marginalization constraints according to [2, 
equation 2.3]. It follows from these marginalization 
constraints that a feasible solution satisfies further 
inequality constraints: 

(Vp<d,V(Ji, J 2 ) d ) f p (Ji,J 2 ) >0. 

This ensures that the corresponding solution p is fea¬ 
sible to FLP and hence the relaxations are identical. 
The persistency lemma formulated in [2, Lemma 3.2] 
has the following form. 

Lemma A.13 (Lemma 3.2 of Adams et al. [2]). Let 
N + u N~ u N* be a partition of V. Let there exist 
dual multipliers satisfying certain sufficient condition 
(as defined in [2]). Then there exists an optimal solu¬ 
tion to LP (d/n) having w s = 1 for s e N + and w s = 0 
for s e N~. 


For an interested reader we remark that the suffi¬ 
cient conditions of their lemma ensure a dual feasible 
solution for f 2 which is complementary to the solu¬ 
tion defined by N + , N~ on N + u N~ and is zero on 
all boundary constraints J cz B. This ensures com¬ 
plementarity with any feasible primal solution consis¬ 
tent with N + , N~. We are not going to prove this 
claim formally, but use the existing lemma together 
with the observation that their sufficient condition 
does not depend on the coefficients {cj \ J cz N ?}. 


Lemma A. 14. Assume that the conditions of 
Lemma A. 13 are satisfied. Let w s = 1 for s e N + , 
w s = 0 for s e N~ and let w be feasible to LP (d/n). 
Then w is optimal to LP(cZ/|1) for f 2 . 


Proof. It can be seen from [2, equation 3.4d] that 
if the conditions of their Lemma are satisfied then 
they are also satisfied with coefficients cj = 0 for all 
J cz N*. Since w is feasible, it is also feasible to 
their equation 3.5[d]. The objective of the latter is 
identically zero, therefore w is optimal to their equa¬ 
tion 3.5[d]. LemmaA.13 proves that w is optimal to 
LP(d/n). Since we made / 1 zero, w is optimal to 
LP(d/|V 2 |) for f 2 . □ 


With this refined result we can easily prove the 
relaxed-improving property. Define the mapping 


Ps(x s ) = < 


0 , 

1 , 


seN~, 

seN+, 

otherwise. 


(173) 


Theorem 4.9. Persistency by the sufficient condi¬ 


tion of Adams et al. [2, Lemma 3.2] corresponds to 
an FLP-improving mapping. 

Proof. We need to show that mapping p is FLP- 
improving. Let p £ A. Then p! := [p]p e A is 
optimal to f 2 by Lemma A. 14. At the same time 
(VJ e £, J n N? # 0) we have pf = pj and hence 
(Z 1 ! Pj) = (/S/ij). It follows that 

</, [P]M> = </\ [P\P> + </ 2 , [P\P> < </, P>- (174) 

□ 

We obtained that FLP maximum persistency dom¬ 
inates the persistency result of Adams et al. [2], In 
case of strong persistency, the former is found by Al¬ 
gorithm 1. 

In the pairwise case, conditions [2] are always sat¬ 
isfied for the same persistency assignment as the roof 
dual. However, in the higher order case our condi¬ 
tions are less restrictive as can be seen from the dual 
representation (29): we require less inequalities than 
the complementarity slackness imposed on all solu¬ 
tions of the unassigned part. 
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